Skip to main content
PLOS One logoLink to PLOS One
. 2020 Aug 31;15(8):e0238373. doi: 10.1371/journal.pone.0238373

A preregistered multi-lab replication of Maier et al. (2014, Exp. 4) testing retroactive avoidance

Markus A Maier 1,*, Vanessa L Buechner 1, Moritz C Dechamps 1, Markus Pflitsch 1, Walter Kurzrock 1, Patrizio Tressoldi 2, Thomas Rabeyron 3,4, Etzel Cardeña 5, David Marcusson-Clavertz 5,6, Tatiana Martsinkovskaja 7
Editor: Florian Naudet8
PMCID: PMC7458331  PMID: 32866215

Abstract

The term “retroactive avoidance” refers to a special class of effects of future stimulus presentations on past behavioral responses. Specifically, it refers to the anticipatory avoidance of aversive stimuli that were unpredictable through random selection after the response. This phenomenon is supposed to challenge the common view of the arrow of time and the direction of causality. Preliminary evidence of “retroactive avoidance” has been published in mainstream psychological journals and started a heated debate about the robustness and the true existence of this effect. A series of seven experiments published in 2014 in the Journal of Consciousness Studies (Maier et al., 2014) tested the influence of randomly drawn future negative picture presentations on avoidance responses based on key presses preceding them. The final study in that series used a sophisticated quantum-based random stimulus selection procedure and implemented the most severe test of retroactive avoidance within this series. Evidence for the effect, though significant, was meager and anecdotal, Bayes factor (BF10) = 2. The research presented here represents an attempt to exactly replicate the original effect with a high-power (N = 2004) preregistered multi-lab study. The results indicate that the data favored the null effect (i.e., absence of retroactive avoidance) with a BF01 = 4.38. Given the empirical strengths of the study, namely its preregistration, multi-lab approach, high power, and Bayesian analysis used, this failed replication questions the validity and robustness of the original findings. Not reaching a decisive level of Bayesian evidence and not including skeptical researchers may be considered limitations of this study. Exploratory analyses of the change in evidence for the effect across time, performed on a post-hoc basis, revealed several potentially interesting anomalies in the data that might guide future research in this area.

Introduction

In 2014, the Journal of Consciousness Studies published an article by Maier and colleagues [1] that reported seven experiments testing the effects of randomly selected stimulus presentations on behavioral decisions that had occurred before those presentations. Specifically, they tested the influence of randomly drawn future negative picture presentations on avoidance responses preceding them. Both picture processing and response selection were maintained at nonconscious levels. The authors hypothesized that if individuals anticipate the—albeit random—future outcome of a binary choice, they will unconsciously select the less aversive outcome. In four of the seven experiments, the authors observed the predicted effect, while two showed null findings and one a statistical trend. In a meta-analytic summary of the seven studies, they reported a significant but small overall effect, ES = 0.07, z = 3.79, p < .0001; combined Bayes factor (BF10) = 293 (extreme evidence for H1), indicating greater-than-chance avoidance of negative picture presentations. The empirical quality of the data was highest in the final experiment of the series, which applied a sophisticated method involving a quantum-based random number generator for response key and picture assignment and included a powerful sample size. In this rigorous study, a significant effect was also observed, but only with a BF10 = 2, considered as merely anecdotal evidence for H1. Despite the cumulative results, this latter finding cast some doubt on the reproducibility of this effect. Thus, an exact and powerful replication was performed to clarify this and will be reported herein.

The phenomenon addressed in the studies mentioned above is called “retroactive avoidance” and belongs to a broader class of psychological responses denoted by the terms “precognition” or “retroactive influence”. Similar effects from a similar design using unconscious picture processing but conscious response selection have been reported by Bem ([2] Exp. 2), and evidence for the existence of precognition was documented in a recent meta-analysis by Bem, Tressoldi, Rabeyron, and Duggan [3] that included 90 experiments run between 2001 and 2013, many of which were direct replications of one of Bem’s [2] studies. This meta-analysis suggested that an effect could primarily be located within paradigms that involve nonconscious stimulus processing, so-called “fast thinking” protocols, although no overall effect was observed with slow thinking paradigms, confirming the findings of Galak, LeBoeuf, Nelson, and Simmons [4] and Ritchie, Wiseman, and French [5].

Although Maier et al.’s [1] findings have not been discussed, Bem’s original findings have been extensively debated and critiqued. Questionable research practices, such as p-hacking and publication bias as a potential source of Type I error, have been addressed by Francis [6] and Schimmack [7]. The usefulness of frequentist methods for analyzing the results, given the low power of the original studies, has been questioned by Wagenmakers, Wetzels, Borsboom, and Van der Maas [8] and Rouder and Morey [9], with both teams suggesting Bayesian analyses (see, however, the re-analysis by Bem, Utts, and Johnson [10], which found partial support using Bayesian analyses). Given these methodological weaknesses, which also affected a large portion of empirical findings in psychology at that time, LeBel and Peters [11] recommended a stronger emphasis on exact replications. Although these criticisms cannot be applied to the work of Maier et al. [1], we agree with this last proposition. In addition to meta-analyses, each effect obtained with one specific paradigm should be tested for replicability within this paradigmatic framework (see also [12]). Since the most conservative study with regard to the randomization procedures used—Maier et al.’s Experiment 4 [1]—revealed an effect that was significant but weak and only anecdotally relevant in terms of Bayesian evidence, we followed this suggestion and pursued a preregistered multi-lab attempt to replicate [1] (Exp. 4) based on Bayesian analyses.

Before we go into a description of the actual study, we will briefly mention the theoretical background of the original work of [1] and of this study in order to provide an explanation for why the original authors focused on unconscious processing. The original authors derived their hypothesis from a group of interpretations that link the emergence of conscious moments from unconscious processing to quantum theory. Central to their argument were the orchestrated objective reduction (OrchOR) philosophy of mind of Roger Penrose and Stuart Hameroff [1317] and the generalized quantum theory (GQT) put forward by Harald Atmanspacher, Hartmut Römer, Walter von Lucadou, Thomas Filk, and Harald Walach [1823]. Both theories link unconscious processing to properties of quantum states and conscious moments to the process of measurement, including the establishment of local reality (note that GQT only applies the mathematical rules of quantum mechanics to these types of processes but makes no assumption about the exact physical nature of these processes). Within the unconscious realm, information processing follows quantum rules that involve superpositions [24], as well as spatial [25] and temporal non-locality [14, 2629]. The latter could be explained by the hypothesis that the arrow of time may be bidirected at the quantum level and that the directed flow of time, from the past via the present to the future, only occurs in classical reality [14]. These phenomena are nevertheless unrestricted by the Plank constant, and, following [13] Maier et al. [1] argued that–if these interpretations are true- they can be extended to the macroscopic level. Regarding the possibility of empirically observing retroactive avoidance, both theories suppose the existence of such effects when studied with paradigms that test unconscious processing (see also [3032]). It has to be emphasized that these interpretations of quantum mechanics are favored only by a minority of experts in the field and they are still debated amongst all the other interpretations that have been put forward.

We will now describe the design of the present replication study, which followed the protocol of the original study ([1] Exp. 4) exactly. The main goal of this study was to test the replicability of the retroactive avoidance effect with a high statistical power together with a Bayesian analysis of the results, as suggested by [8]. All procedural details and statistical methods used have been preregistered on OSF and can be found here: https://osf.io/yqqfz. The participating labs all confirmed that they used the protocol exactly as outlined in the preregistration. To make sure that all labs followed the exact same procedure, VLB informed all collaborators through extensive personal (phone or meetings) and email communications about all relevant procedural details. She also sent a written “instruction for experimenters”to all collaborators to make sure that they closely followed the original procedure. The instruction document and the eprime code can be found here: https://osf.io/yqqfz/files/.

All labs used masked experimenters (with one short-term exception; see below) and sent the raw data for analyses to the main author. All incoming data have been subsequently transferred into a sequential Bayesian analysis. The final BF score will be reported as evidence for or against the alternative hypothesis. In addition, post-hoc analyses of the variations in the evidence for the effect across labs and of the overall effect across time will be provided for exploratory reasons.

Method

All research presented in this article involved human participants, and the protocol was approved by the respective ethical boards of the participating universities. The following ethical boards reviewed and approved the protocol: Ethikkommission der Fakultät 11(Psychologie & Pädagogik) an der LMU München (Germany); Comitato Etico Della Ricerca Psicologica, Università di Padova, (Italy); Comité d’éthique de l’université de Nantes, Nantes (France); Regionala Etikprövningsnämnden i Lund, (Sweden); Ethical Committee on Psychology, Institute of Psychology named after L.S. Vygotsky, RGGU (RSUH), Moscow, (Russia). Written consent was obtained from all participants. The study was an exact replication of [1] (Exp. 4), with study details and main analysis preregistered at OSF before the start of data collection. It has to be noted that the text of the preregistration had been created and stored before the beginning of the data collection (in September 2013), however the first author was not aware of the fact that it had to be frozen to complete the procedure. This was done a few months later (in July 2014) without changing any wording of the original text. So basically, the preregistration was finalized after data from 260 participants had already been collected and inspected for the first time. In addition, the study was originally not preregistered explicitly as a multi lab project (but did not exclude this option either).

The experiment was run on a computer to which a response box or keyboard was attached. We tested retroactive avoidance of negative picture presentations. Both the avoidance response and the picture perception were kept unconscious. To reach this goal, in each trial participants were required to press two predefined response keys simultaneously. The response box (or keyboard) was designed so that one of the two keys always triggered first, regardless of the participants’ attempts to press the keys at exactly the same time. In this way, the participants unconsciously made a binary choice. For each trial, each key was randomly assigned to either a negative or neutral masked picture presentation (details below) that appeared after the key-press. Our directional hypothesis was that participants unconsciously anticipate the future outcomes of their actions and therefore would avoid negative pictures more often than expected by chance. This would replicate the original findings of [1] (Exp. 4).

Participants

The participants in this study comprised 2004 undergraduate and graduate students (1,345 Females, 659 Males; mean age = 23.41 years, SD = 6.57) who participated for course credit. We did not specify any exclusion criteria other than basic vision abilities and age of participants being at least 18 years. Of these, 154 participants were attending the Institute of Psychology, RSUH, Moscow (Russia), 235 the University of Padua (Italy), 103 the University of Nantes (France), 99 Lund University (Sweden), and 1,413 Ludwig-Maximilians University of Munich (Germany). They were recruited through the departments’ announcement boards, online recruitment platforms, or handouts distributed during class. Students were told that their participation would involve up to three different experiments assessing psychological states, but no further study details were provided in the recruitment information.

Materials

Software and computer

The study was conducted using different computers and screens in different labs, all of which were equipped with Windows-run computers. For trial randomization, a quantum-based random number generator (QRNG) from Id Quantique was attached to the computer (see www.idquantique.com). This hardware device has passed both DIEHARD and NIST tests of randomness and is one of the most powerful means of generating true random numbers based on quantum superpositions [33] (see also various certificates from national agencies on their homepage). E-Prime 2.0 or jsPsych 5.0.3 software was used for response registration and picture presentation. Some labs used keyboards and other response boxes (Black Box ToolKit USB response pad, Cedrus RB-740) for response registration. When keyboards were used, the left and right cursor keys served as response registration devices. When response boxes were used, the lower left and right buttons served as response keys.

Stimuli

The stimulus pictures used were subsets from the International Affective Picture System (IAPS) [34]. Ten extremely negative pictures with a mean valence of 1.73 (SD = 0.27) and ten neutral pictures with a mean valence of 4.90 (SD = 0.27) on a 9-point rating scale obtained from a normative sample were selected (for details, see [1] Appendix).

Experimenters

Only trained undergraduate research assistants were used as experimenters. They were masked with regard to the study goal and the pictures used in this experiment. For some of the participants from Lund University only, one of the authors (DM) served as experimenter.

Procedure

Each participant was tested individually or in a group session in a quiet lab room. In group sessions, individuals were separated by small side walls that prevented visual contact between them. Light was dimmed in the rooms. In some sessions, the retroactive avoidance study was the only study performed; in others, it was the final experiment in a series of up to three studies. These pre-studies were standard psychological experiments that varied across labs and time. The experimenters ensured that these were around 15 to 20 minutes in duration. In group sessions, all participants began each experiment at the same time. The focus study began with a written instruction presented on the screen:

In the following experiment, you must press two keys on the keyboard (or response box) as simultaneously as possible. You will see this instruction on the monitor’s screen:

Please press the keys.

When you see this instruction, please press both keys as simultaneously as possible!

Afterwards, colored stimuli will be presented, which you should simply watch.

As soon as the participants had read the instructions, the experimenter explained that they should gently place their index fingers on the keyboard’s left and right cursor keys or the left and right bottom keys on the response box. The experimenters immediately checked whether the participants had followed this instruction. The response device was placed on the table in front of the participant with the relevant response keys centered in the midpoint of the computer screen. The monitor was placed at a distance of about 50 cm from the participant. Experimenters emphasized that both index fingers should remain lightly touching the response keys throughout the experiment. The experimenters further instructed participants that, once the “Please press keys” command appeared, they should press both keys as simultaneously as possible. Participants were informed that this was not a speed task but that their responses should be spontaneous. After the key-press, they were asked to simply watch the stimuli presented on the screen following the response.

Each trial started with the key-press command presented on the screen. Once a response was performed, the command line disappeared and, after a 430 ms presentation of a black screen, a masked negative or neutral picture was presented. The masked picture presentation consisted of three consecutive stimulus presentations: a masking stimulus presented for 70 ms, followed by the presentation of a negative or neutral picture for 14 ms followed by the same mask for 70 ms. Each negative and neutral picture was combined with an individual mask. The mask was constructed by dividing each original picture into small squares that were randomly rearranged, forming a stimulus consisting of the same color and lightness properties as the original, but without any content. This effective masking should ensure a subliminal presentation of the original picture and was successfully used in [1]. After the second masking stimulus had disappeared, a 3000 ms black screen inter-trial interval appeared before the next trial was initiated by the key-press command line. A total of 60 trials were presented in this way. The main trials were preceded by three practice trials with neutral images, which helped the participants to familiarize themselves with the task.

Although participants were told to press both keys simultaneously, given the design of the response devices used, one of the two keys always triggered first. Thus, in any given trial either a left or a right key-press was registered, even though participants subjectively performed two-key simultaneous responses. After each response registration, a randomization procedure took place. The QRNG that was connected to each computer randomly created a bit (0/1) during each trial immediately after response registration. Since this QRNG does not operate with a buffer, it was ensured that this actual bit was always created after the key-press. Prior to data collection, an additional randomization was prepared, in which each participant number was linked to a list of 60 bits—one for each trial—using a QRNG. The combination of the pre-stored bit for each trial and the bit actually created after the response registration then defined whether a negative or neutral masked picture appeared afterwards. Negative and neutral pictures were drawn without replacement until all ten pictures from a subset had been drawn (and the process then began again with the same set), leading to a maximum of six presentations of the same picture within a session of 60 trials (6 x 10). In this way, the consequence of each single response could not be classically or algorithmically anticipated by the participant. Any effects potentially observed could then only be explained by retroactive or unconscious precognitive effects from the future. Since QRNG-based picture selection was based on quantum mechanical outcomes, a true source of randomness was used. Null effects should thus lead to 50% negative and 50% neutral picture presentations on average across trials and participants.

Results

The Results section consists of three parts. In the first, the main analysis tested the evidence for or against the hypothesis that the sample’s mean score of negative picture presentations would be lower than chance expectation (50%). This analysis was predefined in the preregistration proposing a one-tailed Bayesian one-sample t-test. The second part, there were two additional analyses. First, analyses are reported with wider priors testing the robustness of the effect obtained in the main analysis; second, data from each lab using the original prior are presented separately. These analyses facilitated the exploration of the robustness and variations of the effect across different labs. In the third subsection, three exploratory analyses address the temporal variation of the sequential BF within the original Study 4 of [1] combined with the newly collected data in the study presented here. These analyses explored the temporal change in evidence for the effect across time (from the initial detection of the effect to the later replication attempt), as suggested by [3537]; see also [23]. These analyses tested non-random fluctuations within the combined data sets against 10,000 simulated data sets. With the exception of the main analysis, no other analyses were part of the preregistration and are therefore purely post hoc and exploratory in nature.

Main analysis

Since a sequential Bayesian testing approach was applied in this study, the final sample size was not predefined. Instead, an accumulative data collection and analysis strategy using Bayesian inference techniques for hypotheses testing was used, as suggested by [8]. All 60 trials per participant were handled following the exact same protocol as the original study. That is, a mean score for each participant was computed and subsequentely subjected to the Bayesian analysis. This approach allows for data accumulation (i.e., additional respondents can be tested and results added into the dataset) until a specified Bayes factor (BF) for H1 (or H0) has been reached. It also provides the option of ceasing data collection at a predetermined BF. We defined a BF of 10 as the stopping point for evidence for both H0 and H1. This research method is described in [8]; see also [35] for further details). The BF is arguably the best indicator of the evidence for an effect at any moment of data collection. This statistical test can be used sequentially and gives a precise estimation of the probabilities of the two competing hypotheses at each data point. The BF is consistent, which means it will give a more precise answer the more data it considers even if the null hypothesis is true [38]. It describes the relative amount of evidence that the data provide for or against a postulated effect. In this way, the existence (H1) and the non-existence (H0) of an effect can be tested. A BF of 10 or higher is considered to indicate strong evidence for H1 or H0, respectively. For instance, a BF10 = 10 means that the H1 is ten times more likely to be true than the H0. All participating labs sent their incoming data at least once per semester to the authors from the LMU, who added the new data to the total data set and calculated the actual BF from the overall data collected up to that point. This was repeated over several years, and the sequential change in the BF was observed closely during this time.

To calculate the BF, a probability distribution for effect size must be specified a priori. The effect size of the original study was dcohen = .1. Typically, a Cauchy distribution centered around zero with a scale parameter r is used to describe this prior. This distribution (δ~Cauchy [0, r]) identifies the likelihood of the data given that there is an effect, p(data|H1). Based on the original effect size, an r of 0.1, i.e., δ~Cauchy (0, 0.1), (see also [1]) was chosen a priori and specified in the preregistration. The entire procedure was also approved by Eric-Jan Wagenmakers via personal communication in 2013 (email correspondence from August 19, 2013). A one-sample t-test (one-tailed) was performed on a regular basis, at least once at the end of every semester during periods of data collection, to test whether the actual sample’s mean score of negative stimuli presentations were below chance (50%). For all Bayesian analyses, the statistical software tools R (Version 3.5.1) and JASP (Version 0.10.1 [39] and previous versions) were used. This was repeated over several years from October 2013 to March 2019. Although at this time the stopping criterion had not been met, all members of the participating research teams agreed to cease data collection primarily because the financial and human resources were exhausted. Consequently, the actual status of evidence for or against the effect will now be reported.

The final Bayesian one-sample t-test (one-tailed) with a total of 2004 participants revealed a BF10 = 0.23 (BF01 = 4.38) in favor of the null hypothesis. The mean score for negative stimuli for all participants was M = 29.97, SD = 3.92, providing moderate evidence for a null effect. Thus, against our prediction, the participants’ mean score of avoidance responses was not clearly below chance level. Fig 1 represents a sequential analysis of the BF across all participants in the temporal order of testing.

Fig 1. Sequential Bayes factor curve for all 2004 participants in the temporal order of data collection.

Fig 1

Additional analyses

To better understand the exact nature of the main result additional analyses were performed that were not part of the original preregistration.

Robustness analyses

When testing a directional hypothesis with a Bayesian analysis using a small r = 0.1 for the prior, δ ~ Cauchy (0, 0.1), it is difficult to reach a BF01 > 10 and thus to confirm the null hypothesis with a reasonable amount of data. Given the postulated meager effect size, a much larger sample size than that presented here would be required. Therefore, we also conducted robustness analyses using alternative priors provided by JASP to test the null effect revealed here. The results are illustrated in Fig 2.

Fig 2. Sequential Bayes Factor curves for different priors used.

Fig 2

As the graph illustrates, when wider priors were applied in the additionally performed Bayesian one-sample t-tests, final BFs01 > 30 were found, indicating very strong evidence for the H0. These post-hoc analyses support our interpretation of the original data analysis that a true null effect was detected there. The raw data can be found here: https://osf.io/yqqfz/files/.

Variations across labs

Next, several Bayesian one-sample t-tests (one-tailed) with the original prior were performed separately for each lab to test any variations in the effects between the different participating labs. Table 1 presents the sample size, the final BFs, mean scores, and standard deviations for each lab.

Table 1. Bayesian and descriptive analyses of the retroactive avoidance effect for the five individual participating labs.
N BF10 Evidence Mean SD
Germany 1413 0.20 (BF01 = 5) moderate for H0 30.00 3.88
Italy 235 0.31 (BF01 = 3.26) moderate for H0 30.12 4.10
Russia 154 1.93 (BF01 = 0.52) anecdotal for H1 29.49 3.83
France 103 1.19 (BF01 = 0.84) anecdotal for H1 29.52 4.27
Sweden 99 0.40 (BF01 = 2.50) anecdotal for H0 30.20 3.80

Sub-samples are ordered according to descending sample size.

As the table above indicates, none of the individual labs produced strong evidence for or against the effect. The strongest trend in line with the prediction was observed at the lab in Moscow University. By contrast, the two most powerful sub-samples (from Padua and Munich) show moderate evidence for H0. To test the homogeneity of the results across labs, we also calculated a meta-analysis with a random-effects model. The heterogeneity estimates were not significant (tau = .0007; I2 = .01%; Q(4) = 4.36, p = .36), indicating a rather homogenous set of data. Mean overall effect size was ES = .008 (SE = .02, p = .76) also revealing a null finding.

In sum, the non-significant heterogeneity test and the low sample size in most sub-samples prevents any further interpretation of the variations. However, given the high sample size in the German data collection one could assume that this partial result might be strongest related to the true population parameter.

Exploratory analyses

In the following section exploratory analyses are provided on a post-hoc basis to test a theoretical proposition regarding potential temporal variations of the retroactive avoidance effect. This important theoretical aspect was not considered by Maier et al. [1] nor the preregistration of the study presented here. This issue is central to GQT and attracted our closer attention only in 2018, a year before this study ended. Regarding retroactive avoidance, GQT and its special variant of quantum mechanics’ no-signal theorem, the non-transmission axiom (NTA) [23] supposes only the existence of time-symmetric entanglement correlations rather than time-reversed causal signals [20, 23]. Otherwise, the effects of future events on past information processing would violate the Second Law of Thermodynamics [40] and certain restrictions based on special relativity, such as the impossibility of supra-luminal signal transfer [41]. As entanglement correlations behave indeterministically, von Lucadou et al. [23] in their model of pragmatic information (MPI) proposed that, when evidence for retroactive effects has been found in a first study, in later attempts at replication, these effects should disappear as a result of long-term unsystematic variations and lead to a decline or displacement of the overall effect. Moreover, the complementarity relation between effect detection and its future replication prevents the systematic use of retroactive effects at a classical level. Maier, Dechamps, and Pflitsch [35] built on these propositions by arguing that, in spite of these difficulties of replication, a systematic change of evidence could be observed for these effects across time. This additional assumption would help distinguish MPI dependent decline effects from simple regressions to the mean effects. They suggested that the temporal change of the effect across time might indeed follow a systematic oscillation pattern, whereas the overall mean score of replications should not statistically deviate from chance expectation (see also [37]). In this way, all assumptions of quantum mechanics, including the randomness postulate, would be fulfilled, and scientific evidence for such effects would be shifted from an analysis of the samples’ mean score against chance to systematic temporal oscillations. Dechamps and Maier [37] developed three analytical methods to test these assumed systematic oscillations against random fluctuations.

The three methods were designed to test the oscillations of the effect for data that combined the initial effect detection and replication attempts across time. The time course of the corresponding sequential Bayes factor should be analyzed with: (a) an identification of the highest reach BF found at any time during the data collection compared with the highest BFs reached in 10,000 simulations of the data obtained from the same QRNG used in the original design; (b) a test of the area under the sequential BF (energy of the curve) with BF = 1 as baseline compared to the 10,000 simulations; and (c) fast Fourier transforms (FFTs) of the sequential BF of the human data and the 10,000 simulations with a comparison of the amplitudes obtained. These three analyses test the non-random variation of the effect across time and provide a conservative test of non-random fluctuations within such data sets. Again, we wish to emphasize that this theoretical background and analytical methods were not available during the planning stage of the present study but were only developed during the last year. The following analyses are therefore purely exploratory and are proposed here for testing during future research into effects of this nature.

Temporal analyses

We performed three post-hoc analyses to achieve a better understanding of the effects’ development over time [37]. For these analyses, we included the initial data and our later replication data. These were the only two studies to have been run with a QRNG within this specific retroactive avoidance paradigm and were the only studies relevant to the theory outlined above, since only quantum-based random mechanisms were addressed in this research framework. To this end, we examined the sequential Bayesian analyses of all retroactive avoidance data obtained from these two studies, i.e. a combination of the 324 data files from [1] (Exp. 4; unfortunately, three original E-Prime files were corrupt and could not be used for the present analysis; the original n was 327) and the 2,004 participants in the replication attempt, which were arranged in the exact temporal order of data collection. We compared this dataset of 2,328 participants to 10,000 simulated datasets of the same size. That is, these simulations consisted of 139,680 random bits each (2,328 participants * 60 trials) aggregated in the same fashion as the experimental data. Subsequently, 10,000 sequential Bayesian t-tests with the same parameters as the experimental data (one-tailed; δ ~ Cauchy (0, 0.1)) were conducted based on 2,328 data points each. These simulations represent an experimental null-effect data set. Alpha error probability was set at the .05 level for all analyses.

Maximum BF

First, we compared the highest reached Bayes factors obtained from the human sample to those obtained from the 10,000 simulations. The highest BF in the human sample was 48.68 and was reached early on at n = 65 (see Fig 3). Only 1.24% of all simulations reached such or a higher BF at any point (see Fig 4A).

Fig 3. The sequential BF across 2328 participants (red line) compared to 10,000 sequential BFs obtained from simulated data (gray lines).

Fig 3

The grey line indicates the median of all simulations. 95% of the sequential BFs obtained from all simulations lie below the dotted line.

Fig 4.

Fig 4

a-c: a) Maximum BF obtained from the human data (located by the red line) and from 10,000 simulations displayed as a density graph representing the maximum BF null distribution (dark line); b) BF energy obtained from the human data (located by the red line) and from 10,000 simulations displayed as a density graph representing the energy BF null distribution (dark line); c) Sum scores of amplitudes obtained from the FFTs of human data (located by the red line) and of 10,000 simulations displayed as a density graph representing the sum score amplitudes null distribution (dark line).

BF energy

Next, we examined the overall orientation of the BF curve. We calculated the area between the curve and the borderline of evidential power between H0 and H1 at BF = 1. A positive value of this area—also called the curve’s energy—indicates an overall tendency for the BF to be directionally positioned toward H1. The energy of the human sample’s sequential BF was 1453.04, which is surpassed only by 4.2% of the simulations (see Fig 4B). The mean energy of all simulations was found to be M = -844.42 (SD = 8288.72).

Admittedly, the results from the maximum BF and the BF energy analyses are to some extent correlated, since extreme BF values are usually accompanied by higher energies and vice versa. To visualize this relation, we added a scatter plot with maximum BF(log-transformed) and BF energy (inverse-hyperbolic-sine-transformed) on the two axes (Fig 5). Each single data point in the graph displays the combination of the respective results of both analyses for all simulations (grey dots) and the human data (red dot). The data cloud indicates how both methods are related to each other. As can be seen, the outcomes of both analyses are positively correlated and extreme scores are very rare. In addition, the graph locates the human data within the cloud of simulated data: within the combined scores the human data set is outstanding (p = .0104; that is, only 104 of the simulations reached the same or exceeded its combined score; see blue area in Fig 5). The scatter plot in Fig 5 also indicates that ocassionally some time series did not exhibit extreme BFs but reached a high level of energy and vice versa. This underscores the usefulfness of reporting both analyses separatly (see above).

Fig 5. Combined scores of maximum BF (log-tranformed) and BF energy (inverse-hyperbolic-sine-transformed) based on analyses of human data (red dot) and 10,000 simulations (grey dots).

Fig 5

The blue area indicates the number of combined scores obtained from simulations that reach or exceed the human combined score.

Frequency spectrum analysis by fast Fourier transform (FFT)

In the third analysis, we decided to examine the oscillation pattern of the sample’s sequential BF more closely. Any input signal can be converted to a representation of its composite frequencies via a Fourier transformation. This transform indicates the size of the amplitudes of all frequencies that comprise the input sequence. For a random sequence, none of the frequencies should stand out. Noticeable spikes, however, indicate the presence of a periodic element. An FFT was conducted on the sequential Bayesian analyses of the human data and on each of the 10,000 simulations. Sampling rate was 1/N in each case. Since the resulting transform is symmetric, only the first half is considered in the analysis, resulting in 1,164 tested frequencies. To test the FFT results from the human data against chance occurrence, all 1164 amplitudes obtained from the FFT of the human data set were then added up creating a sum score of the amplitudes obtained from all tested frequencies of this set. In the same way for each of the 10,000 simulations the sum score of amplitudes was computed (Dechamps and Maier [37] used a similar but less elegant test). The distribution of the sum scores of amplitudes across all simulations served as the null distribution (see Fig 4C). The sum score of amplitudes of the human data set was Sumamp = 34.84. Only 147 of the simulations (1.47%) reached a sum score of 34.84 or higher.

As Fig 4C illustrates, the sum score of amplitudes of the human data (red line) is at the extreme end of the density distribution (dark line). The human sample FFT revealed a more pronounced amplitude pattern than 98.53% of the simulations.

In sum, analyses of the temporal change of the effect across time, expressed by the sequential BF analysis revealed across three different types of methods, show a clear non-random variation of the effect across time. The BF curve variation of human data across time seems to be more pronounced than those obtained from most simulated data sets. This fluctuation in the human data creates an anomaly that fits with von Lucadou et al.’s [23] propositions and is in line with Dechamps and Maier’s predictions [37]. The data files including the simulation data and the original data from Maier et al. (Exp. 4) [1], the R code and the jsPsych code can be found here: https://osf.io/yqqfz/files/.

Discussion

The main goal of this study was to test the replicability of a retroactive avoidance effect observed by Maier et al. [1] (Exp. 4), in a preregistered, high-power, multi-lab study (N = 2,004). With the aid of Bayesian analysis, we tested the hypothesis that randomly presented future negative events can be anticipatorily avoided at an unconscious level. Although the predefined stopping rule of BF > 10, which would indicate strong evidence in favor of either H1 or H0, was not reached, the BF01 found within the actual data was 4.38. This provided moderate evidence in favor of the null hypothesis. Given these results, the null hypothesis was four times more likely than the alternative hypothesis to be true. Thus, our prediction was not substantiated, and the results reported by Maier et al. [1] (Exp. 4, ES = 0.1) were not replicated in the present study. Although the moderate evidence leaves sufficient room for speculation that the effect might be confirmed should additional data be collected, the research team in Germany is pessimistic in this regard, though other authors consider that using selected participants, enhancing motivation, and so on may yield different results, as indicated by previous research [42, 43]. First, the actual sequential Bayesian curve displays a rather flat line starting at the second half of the study with a trend toward growing evidence for H0, and it seems unlikely that this trend will turn again in the opposite direction. Second, applying wider priors to the Bayesian analysis revealed strong evidence for the null effect (BFs01 > 30). Thus, the fact that we did not reach our actual stopping criterion is due to the prior chosen rather than to a lack of power. We come to the conclusion, therefore, that, based on the overall mean score of avoidance reactions, no retroactive influences were detectable in this experiment.

This result is in line with earlier studies that were unable to replicate retroactive influences at a conscious level using other paradigms (see e.g., [4, 5], partly [3]). It should be noted, however, that these replication attempts failed when they used slow thinking paradigms that involved more conscious procession during response preparation and future event perception, whereas studies using fast thinking protocols have yielded better results [3]. The problem with these latter replication studies, however, is that they lack statistical power. Thus, the present study is, to our knowledge, currently the only study to test precognition using a fast thinking protocol that meets all the criteria for a rigorous, scientifically convincing replication, according to [8], and this replication found evidence for a null effect. However, we encourage researchers to test each paradigm that has been supported in the past with a similar approach as that used here. Powerful replications, as proposed by [12] and [8], would provide additional evidence to the meta-analyses already available [3] and will offer a clearer picture of the replicability of results in these particular precognition or retroactive avoidance research paradigms. The importance of pre-registered replications was recently supported by an analysis of 15 domains in psychology in which meta-analysis estimates of effect sizes were on average almost three times larger than the pre-registered replications estimates [44].

In addition, the null result observed here is also in line with the assumptions made in standard quantum mechanics and special relativity theory, formulated as no-signal theorem arguing that macroscopic retroactive effects are impossible (for an overview see [45]). Furthermore, theories combining conscious and unconscious processing with computational concepts of quantum theories, such as GQT [18, 2023], view retroactive avoidance as entanglement correlations that obey the indeterminacy principle in line with the no-signal theorem. Lucadou et al. [23] therefore explicitly argue for an almost zero likelihood of successfully replicating such effects, although these effects should occur unsystematically. However, they propose a special method—the matrix design—that could allow to maintain significant results thanks to the effect’s better degree of freedom [46].

Dechamps and Maier [34] (see also [32, 33]), by taking the theoretical constraints of the no-signal-theorem into account, recently proposed an extension of the GQT that supposes systematic non-random oscillations of the evidence for the effect across time. The results are also in line with previous observations regarding replicability in the field of psi research [47]. Bierman [48] describes it as “negative reliability”; Beloff [49] speaks of psi as “actively evasive”; Pallikari and Boller [50] mention a “balancing effect” between positive and negative replications; and Hansen [51] has proposed a broader theory called “the trickster” to explain negative results of this nature. Thus, rather than testing the mean score against chance, future methods might instead focus on FFTs and similar procedures that test systematic oscillation-like variations of evidence for the effect across time (see also [52]). The temporal analyses provided here on a post-hoc basis yielded promising evidence in this direction. An interpretation of the results obtained in the study would be that retroactive avoidance appears, disappears, and even turns into opposite trends from time to time and on different scales (on various frequencies across a broad spectrum; see FFT results above) when additional data are added. This cyclic pattern may be typical for psi effects in general, as indicated by [37, 53]. We wish to emphasize that these analyses do not provide a confirmation of GQT and its extension at this point since they are purely post-hoc in nature. In addition, such a cyclic pattern might also be produced by chance. We quote a reviewer’s argument here: “I’ll propose that it seems much more plausible that the observed pattern of significant original studies followed by replication failures could be produced if there is zero retroactive avoidance effect in the population. The original significant observations were possibly due to some combination of chance occurrence, flexibility in data analysis, and publication/file drawer bias (note: I think it’s quite easy for all of these things to occur even with the best intentions unless you specifically safeguard against them–which is why it’s so important that the authors pre-registered the present project)–and the replication failures are simply correctly identifying a true null effect. This explanation has, in general, been demonstrated as quite plausible in simulations (e.g., [12]), and indeed is a major driving force in terms of the current overhaul in methods in social psychology and other fields.”We admit that such an argumentation is quite convincing and provides a serious challenge to our interpretation of the temporal variation of the effect. All we can say for now is that the original study, although not preregistered, worked with an a priori planned sample size and, due to the fact that it was a final study conceptually replicating a series of six previous ones, possibilities for biased reporting were somewhat limited. In addition, the FFT results provide a quite unusual oscillation pattern that can hardly be obtained by chance as indicated by a comparision with the FFT patterns found within 10,000 simulations. One might therefore wonder whether a false positive in a first study and a true negative in a replication might also produce such an outstanding pattern of FFT results. However, to settle that discussion a fully preregistered confirmatory study predicting a temporal change of the effect across time would be needed. Hence, these results should encourage researchers to follow this approach by conducting empirical research that will a priori address time-dependent effect changes in a confirmatory way. Research is always based on trial and error, and exploratory analyses like these can expand our horizons and open up new promising avenues of scientific exploration.

Acknowledgments

DM was supported by the SPR Research Fund from the Society for Psychical Research.

Data Availability

The data underlying the results presented in the study are available from https://osf.io/yqqfz/files/

Funding Statement

DM was supported by the SPR Research Fund from the Society for Psychical Research. https://www.spr.ac.uk/home The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Maier MA, Buechner VL, Kuhbandner C, Pflitsch M, Fernández-Capo M, Gámiz-Sanfeliu M. Feeling the future again: Retroactive avoidance of negative stimuli. Journal of Consciousness Studies. 2014; 21(9–10), 121–152. [Google Scholar]
  • 2.Bem DJ. Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology. 2011;100(3), 407–425. 10.1037/a0021524 [DOI] [PubMed] [Google Scholar]
  • 3.Bem DJ, Tressoldi PE, Rabeyron T, Duggan M. Feeling the future: A meta-analysis of 90 experiments on the anomalous anticipation of random future events [version 2; peer review: 2 approved]. F1000Research. 2016; 4, 1188 10.12688/f1000research.7177.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Galak J, LeBoeuf RA, Nelson LD, Simmons JP. Correcting the past: Failures to replicate psi. Journal of Personality and Social Psychology. 2012; 103(6), 933–948. 10.1037/a0029709 [DOI] [PubMed] [Google Scholar]
  • 5.Ritchie S, Wiseman R, French C. Failing the future: Three unsuccessful attempts to replicate Bem’s ‘Retroactive facilitation of recall’ effect. PloS One. 2012; 7, 1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Francis G. Too good to be true: publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review. 2012; 19(2): 151–156. [DOI] [PubMed] [Google Scholar]
  • 7.Schimmack U. The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods. 2012; 17(4), 551–566. 10.1037/a0029487 [DOI] [PubMed] [Google Scholar]
  • 8.Wagenmakers E-J, Wetzels R, Borsboom D, van der Maas HLJ. Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology. 2011; 100, 426–432. 10.1037/a0022790 [DOI] [PubMed] [Google Scholar]
  • 9.Rouder JN, Morey RD. A Bayes factor meta-analysis of Bem’s ESP claim. Psychonomic Bulletin & Review. 2011; 18, 682–689. [DOI] [PubMed] [Google Scholar]
  • 10.Bem DJ, Utts J, Johnson WO. Must psychologists change the way they analyze their data? Journal of Personality and Social Psychology. 2011; 101(4): 716–719. 10.1037/a0024777 [DOI] [PubMed] [Google Scholar]
  • 11.LeBel EP, Peters KR. Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology. 2011; 15(4): 371–379. [Google Scholar]
  • 12.Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. 2011;22, 1359–1366. 10.1177/0956797611417632 [DOI] [PubMed] [Google Scholar]
  • 13.Hameroff S, Penrose R. Orchestrated reduction of quantum coherence in brain microtubules: A model for consciousness. Mathematics and Computers in Simulation. 1996; 40(3–4):453–480. 10.1016/0378-4754(96)80476-9 [DOI] [Google Scholar]
  • 14.Penrose R. The Emperor’s New Mind Concerning Computers, Minds and the Laws of Physics. New York: Oxford University Press; 1989 [Google Scholar]
  • 15.Penrose R. Shadows of the Mind (Vol. 4). Oxford: Oxford University Press; 1994 [Google Scholar]
  • 16.Penrose R, Hameroff S. What gaps? Reply to Grush and Churchland. Journal of Consciousness Studies. 1995; 2, 98–112. [Google Scholar]
  • 17.Penrose R, Hameroff S. Consciousness in the universe: Neuroscience, quantum space-time geometry and Orch OR theory. Journal of Cosmology. 2011; 14, 1–17. [Google Scholar]
  • 18.Atmanspacher H, Römer H, Walach H. Weak quantum theory: Complementarity and entanglement in physics and beyond. Foundations of Physics. 2002; 32(3), 379–406. [Google Scholar]
  • 19.Atmanspacher H., Filk A. proposed test of temporal nonlocality in bistable perception. Journal of Mathematical Psychology. 2010; 54, 314–321. [Google Scholar]
  • 20.Atmanspacher H, Filk T. Contra classical causality violating temporal bell inequalities in mental systems. Journal of Consciousness Studies. 2012; 19, 95–116. [Google Scholar]
  • 21.Filk T, Römer H. Generalized quantum theory: Overview and latest developments. Axiomathes. 2011; 21(2), 211–220. [Google Scholar]
  • 22.Römer H. Weak quantum theory and the emergence of time. Mind and Matter. 2004; 2(2), 105–125. [Google Scholar]
  • 23.von Lucadou W, Römer H., Walach H. Synchronistic phenomena as entanglement correlations in generalized quantum theory. Journal of Consciousness Studies. 2007; 14(4):50–74. [Google Scholar]
  • 24.Schrödinger E. Die gegenwärtige Situation in der Quantenmechanik. Naturwissenschaften. 1935; 23(49), 823–828. [Google Scholar]
  • 25.Bell JS. On the Einstein–Podolsky–Rosen paradox. Physics. 1964; 1(3):195–200. [Google Scholar]
  • 26.Hameroff S. How quantum brain biology can rescue conscious free will. Frontiers in Integrative Neuroscience. 2012; 6(1): 93 10.3389/fnint.2012.00093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tressoldi PE, Maier MA, Buechner VL, & Khrennikov A. A macroscopic violation of no-signaling in time inequalities? How to test temporal entanglement with behavioral observables. Frontiers in Psychology. 2015; 6, 1061 10.3389/fpsyg.2015.01061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Aharonov Y, Cohen E, & Elitzur AC. Can a future choice affect a past measurement’s outcome?. Annals of Physics. 2015; 355, 258–268. [Google Scholar]
  • 29.Megidish E, Halevy A, Shacham T, Dvir T, Dovrat L, & Eisenberg HS. Entanglement swapping between photons that have never coexisted. Physical Review Letters. 2013; 110(21), 210–403. [DOI] [PubMed] [Google Scholar]
  • 30.Mensky MB. Mathematical models of subjective preferences in quantum concept of consciousness. NeuroQuantology. 2011. 9(4). [Google Scholar]
  • 31.Mensky MB. (2014). Everett Interpretation and Quantum Concept of Consciousness. NeuroQuantology, 11(1). 10.14704/nq.2013.11.1.635 [DOI] [Google Scholar]
  • 32.Stapp HP. Mindful universe Quantum mechanics and the participating observer. Berlin: Springer; 2007. [Google Scholar]
  • 33.Turiel TP. Quantum random bits generators. The American Statistician. 2007; 61, 255–259. [Google Scholar]
  • 34.Lang PJ, Bradley M M, Cuthbert BN. International affective picture system (IAPS): Affective ratings of pictures and instruction manual Technical report A-8. University of Florida: Gainesville, FL: 2008. [Google Scholar]
  • 35.Maier MA, Dechamps MC, Pflitsch M. Intentional Observer Effects on Quantum Randomness: A Bayesian Analysis Reveals Evidence Against Micro-Psychokinesis. Frontiers in Psychology. 2018; 9, 379 10.3389/fpsyg.2018.00379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Maier MA, Dechamps MC. Observer effects on quantum randomness: Testing Micro-psychokinetic effects of smokers on addiction-related stimuli. Journal of Scientific Exploration. 2018; 32 (2), 265–297. [Google Scholar]
  • 37.Dechamps MC, Maier MA. How Smokers Change Their World and How the World Responds: Testing the Oscillatory Nature of Micro-Psychokinetic Observer Effects on Addiction-Related Stimuli. Journal of Scientific Exploration. 2019; 33(3), 406–434. [Google Scholar]
  • 38.Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review. 2009; 16(2): 225–237. [DOI] [PubMed] [Google Scholar]
  • 39.JASP Team. JASP (Version 0.10.1)[Computer software]. Available at: https://jasp-stats.org/. 2017.
  • 40.Boltzmann L. Über die Beziehung zwischen dem zweiten Hauptsatz der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respektive den Sätzen über das Wärmegleichgewicht. Sitzungsberichte der kaiserlichen Akademie der Wissenschaften zu Wien, II 76, S. 428 Nachdruck in Wissenschaftliche Abhandlungen von Ludwig Boltzmann, 1877; Band II., S. 164–223. [Google Scholar]
  • 41.Einstein A. Zur Elektrodynamik bewegter Körper. Annalen der Physik und Chemie. 1905; 17, 891–921. [Google Scholar]
  • 42.Baptista J, Derakshani M & Tressoldi PE. Explicit anomalous cognition: A review of the best evidence in ganzfeld, forced-choice, remote viewing and dream studies In Parapsychology: A Handbook for the 21st Century, ed. by Cardeña E., Palmer J. & Marcusson-Claverts D., McFarland & Company, Jefferson, NC, 2015. Pp. 192–214. [Google Scholar]
  • 43.Honorton C, Ferrari DC. Future telling—a metaanalysis of forced-choice precognition experiments, 1935–1987. Journal of Parapsychology. 1989; 53(4), 281–308. [Google Scholar]
  • 44.Kvarven A, Strømland E, & Johannesson M. Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behaviour. 2020; 4(4), 423–434. 10.1038/s41562-019-0787-z [DOI] [PubMed] [Google Scholar]
  • 45.Greenstein G, Zajonc AG. The quantum challenge: Modern research on the foundations of quantum mechanics (2nd ed). Boston: Jones and Bartlett; 2006. [Google Scholar]
  • 46.Walach H, Horan M, Hinterberger T, von Lucadou W. Evidence for anomalistic correlations between human behavior and a random event generator: Result of an independent replication of a micro-pk experiment. Psychology of Consciousness: Theory, Research, and Practice. 2019. 10.1037/cns0000199 [DOI] [Google Scholar]
  • 47.Rabeyron T. Retro-priming, priming, and double testing: Psi and replication in a test-retest design. Frontiers in Human Neuroscience. 2014; 8, 154 10.3389/fnhum.2014.00154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bierman D. Negative reliability: the ignored rule Research in Parapsychology edited by Roll WG. and Beloff J. Metuchen: Scarecrow Press; 1980. Pp. 14–15. [Google Scholar]
  • 49.Beloff J. Lessons of history. Journal of the American Society for Psychical Research. 1994; 88, 7–22. [Google Scholar]
  • 50.Pallikari F, Boller E. Further evidence for a statistical balancing in probabilistic systems influenced by the anomalous effect of conscious intention. Journal of the Society for Psychical Research. 1997; 62(849), 114–137. [Google Scholar]
  • 51.Hansen GP. The Trickster and the Paranormal. Philadelphia: Xlibris Corporation; 2001. [Google Scholar]
  • 52.Radin D. Tricking the Trickster: Evidence for Predicted Sequential Structure in a 19-Year Online Psi Experiment. Journal of Scientific Exploration. 2019; 33, 549–568. [Google Scholar]
  • 53.Kennedy JE. The capricious, actively evasive, unsustainable nature of psi: A summary and hypotheses. Journal of Parapsychology. 2003; 67, 53–74. [Google Scholar]

Decision Letter 0

Florian Naudet

16 Apr 2020

PONE-D-20-02005

A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance

PLOS ONE

Dear Dr. Maier,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Thank you for sending us this manuscript. Sorry for the delay in reviewing it but one of the initial reviewers did not answered despite his commitment to review it. 3 reviewers accepted to review the paper and all did a nice job in reviewing it. I want to thank them here and I'm very grateful for the important insights they provided on your manuscript. I'm sure that you will be able to improve it following all there comments. Please also make sure that you are following all the appropriate reporting guidelines (an adapted checklist may be useful).  

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Florian Naudet, M.D., M.P.H., Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for including your ethics statement: "All research presented in this article involved human participants, and the protocol was approved by the respective ethical boards of all participating universities.

Written consent was obtained from all participants."

a) Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study.

b) Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this article Maier et colleagues attempt a direct replication of a pre-cognition study published in 2014.

This study tries to establish if the subliminal presentation of a picture with emotionally negative content presented at a given time can reduce participants’ responses to select which picture (negative VS neutral) to view EVEN BEFORE it has been decided which “key press” will be associated with which type of picture.

This work involved 5 different labs and is based on a large sample of participants acquired over several years.

This study was preregistered and the results contain a fairly clear distinction between the confirmatory part (the replication proper) and its exploratory part.

The confirmatory analysis is based on the use Bayes factor to measure of the relative strength of the evidence in favor of the null hypothesis (future presentation of negative stimuli have no effect on the choice of which key will be associated to which stimulus) compared to the alternative hypothesis (future presentation of negative stimuli reduce the chance that the “selected key” will be associated with a negative stimulus – and thus we should lead to lower than chance proportion of aversive stimuli being presented). The prediction being directional the authors use a directional Bayesian one-sample t-test.

Data acquisition was stopped before reaching the pre-registered thresholds because of lack of resources. Based on the available data and some post-hoc analysis the authors conclude that a true null effect was detected.

The exploratory part of the analysis relies on trying to find anomalies in the time series generated by the sequential analysis Bayes factor and by relying on a frequentist approach and using simulations to generate the null distribution.

Part of the data presented is available on OSF.

In general the manuscript is fairly clear, and I generally agree with the results of the confirmatory analysis. The exploratory part presents one major issue (lack of correction for multiple comparison) that should be fixed (see the exploratory section below). I also have some suggestions to make both on the form and the content of the manuscript.

MAIN ISSUES

- lack of correction for multiple comparison in the exploratory analysis

- too much emphasis put on the very speculative theoretical underpinnings of the work

Exploratory analysis

In this part the authors apply what amounts to a frequentist approach to detect surprising results. They rely on simulations to produce the null distribution against which to test their data.

I suggest that the authors mention what is the statistical threshold they use to establish significance? It seems to be 5% but I do not see it mentioned in the manuscript.

I suspect that the 2 first exploratory analysis done by the authors are partly redundant and do not provide independent lines of evidence. Curves with higher Bayes factor will depart further from the BF=1 line and hence have a higher area. I recommend that to better visualize this, authors use a scatter plot of the simulations for “maximum BF VS area” and to display where the data from this study actually lie in this scatter plot.

The main issue in this section lies in the 3rd analysis and to a similar extend.

The 3rd analysis tests 1164 frequencies and in consequence, unless I missed something, shows a need for multiple comparison correction. However I am unclear as to how the authors corrected for this? Without this the risk for false positive might be clearly inflated. Because of the smoothness of the data, 2 neighboring frequencies are unlikely to be independent data points, so I suspect that applying a Bonferroni correction with a significance threshold of 0.000042955 (0.05 / 1164) will be deemed over conservative. If so I suggest that the authors look for inspiration into the statistical methods literature related to EEG to maybe try to find ways on how to best correct their tests.

The first exploratory analysis suffers from a similar problem even though it is not as obvious. By drawing a 95 confidence interval and looking for data points in the BF time series that “stand out”, the authors are implicitly running a statistical test at every time point so more than 2000 statistical tests: this too requires a correction for multiple comparison. Here again the comments made in the previous paragraph regarding data smoothness and finding the right level of correction apply, but they are complicated by the fact that we are dealing with a time series and that autocorrelation might also be an issue.

On a different note, I am wondering why the author did not include an analysis similar to the harmonic oscillation approach used by Dechamps and Maier (2019)?

Comments on theoretical underpinning

In general I would suggest that the authors spend a lot less time on the theoretical speculations and inspirations behind this work.

I think that the authors are putting the cart before the horse, and trying to link their work to quantum mechanics. It is very premature to do so before they have even ascertained the existence of the phenomenon that they are trying to explain. I would strongly suggest that the authors move a lot of the links made to quantum mechanics (introduction line 71 to 118 ; results line 331 to 358) to the discussion section.

I took the liberty to consult a physicist colleague on the parts of the manuscript that were outside of my area of expertise. I am adding below some of her comments that I suggest the authors should take into account.

- l. 83 : Non locality is only ever spatial. There is no reason for it to imply a temporal bi-directionality and this idea does not appear the mentioned references (14, 24, 26).

- l 85 : The cited reference does not correspond to what is mentioned in this assertion. The Emperor's New Mind is a book comparing human mathematical reasoning and turing machines and has not link with temporal bidirectionality.

- l. 347 : The author seem to view entropy as a force that acts in nature to counteract specific effects where as it is in fact a descriptive mathematical quantity. Entropy does not exist as such, but is a sort of indicator invented to summarize several factors that participate to a certain level of disorder of a set of molecules [...]. In a closed system, the level of disorder always increases (and along the entropy that quatifies it), and this is due to the low number of ordered configuration that therefore have a low probability, this is not due to some property of the bodies or the motions involved that would be called entropy and that would condition the possible configuration. The entropy represents the increase in disorder, it does not cause it. Similarly, the increase in disorder is a sign of time passing by [...], but this increase in disorder does not cause the passing of time nor does it condition it.

Penrose himself recognize that his ideas on the nature of consciouness are speculative and his ideas are considered as wrong by several experts in several fields. Moreover Orch-OR is only related to a a potential of quantum mechanics in the emergence of consciousness and does not bear any link to temporal bidirectionality.

The GQT does not have any link with temporal bidirectionality either. And calling it a theory is premature given that it is at this stage a mere hypothesis formulated by a small group physicists, untested and unknown from the physicists community. It is also incomplete and unable to account for many of the the fundamental aspects of quantum mechanics, as admitted by the CGT defenders themselves. It is at this stage very premature to base new research on it.

OTHER ISSUES

Data

Using the data on OSF and JASP, I was partly able to reproduce some of the results of the confirmatory part.

The lack of data dictionary explaining what each column in the data left me wondering why I was not getting the same results as the authors until I realized that number reported in the data was the number of neutral pictures presented to each participants.

Similarly I was not able to easily reproduce all the results from the “Variations across Labs” section. This mostly stems from the way the data was coded, in which there was not a uniform way to know which data point comes from which lab.

Similarly it seems that different labs had different file naming conventions and time stamping of their data files. I would suggest that an extra variable is added to keep track of the overall order in which data samples were acquired, especially given the time series aspect of the data seemed important for the exploratory analysis that the authors suggest.

Similarly the filenames of some samples seemed to have been copies from other files or possibly when moving files between folders ( back2neg_neu_deutsch Version2-524-1 (2).txt ) so I would suggest that add some code to their analysis pipeline to quality check and consolidate the dataset to remove the possibility of the inclusion of any duplicates.

Concerning the exploratory part, I would suggest that the authors also share the data resulting of the 10 000 simulated datasets to allow for a computational reproduction of the authors’ results.

Similarly, and I might be mistaken about, but I think that the ALL the 327 data points from Maier 2014 are actually accessible and not just the 324 as reported in the manuscript. I was able to download them by using the information actually provided in Maier 2014 and get a plot of the sequential analysis similar to the figure 3 of the manuscript. In order to avoid future readers to have to do this bit of data archaeology, I would suggest that the the results from Maier 2014 are either mentioned (with a link) on the OSF project or, better, added on the OSF project related to this paper (if only because long shelf life of things put on OSF compared to that of some institutional or personal repository).

Code

For the exploratory part I would suggest that the authors make available the R code used to analyze the data and generate their figures, not only for transparency, but for other future researchers who might want to further explore those results to evaluate whether they are worth investing their time and effort to replicate.

Similarly and to simplify the work of future replications, I would suggest that the authors share the E-prime and jsPsych code to their experiment.

Pre-registration

I noted some deviations between the pre-registration and the manuscript. I would suggest that the authors cross-check the content and flag any deviations they might spot with the reason behind them. The authors can take some inspiration from the SMART preregistration format: https://osf.io/6vhyt/

From what I can see it seems that the multi-site nature of this experiment was decided after the pre-registration. I suggest the authors give more details on what measures were taken to ensure the same procedure were applied in all laboratories.

The stimuli and mask presentation time are slightly different between the pre-registration and the manuscript. Is this because the refresh rate of the monitor was different?

Also, as a side note and because short presentation time seem important in the current study as the stimuli should be presented subliminally, stimuli presentation timing are notoriously inaccurate on Windows machines and the presentation time reported by the experimental softwares should rarely be taken at face value. I would strongly urge the authors that for future works they externally measure with a photo-diode and report the actual timing accuracy they can get on their set-up.

The robustness checks of the results using different priors and comparing the results across labs were not part of the original pre-registration.

So in general I would suggest that the confirmatory and exploratory parts of the results sections be made even more clear:

1. “Main analysis” should be named “Confirmatory analysis”

Report the pre-registered analysis

1.1. Additional analysis

Report robustness and across labs comparisons

2. “Temporal change across time” should be named “exploratory analysis”

The actual time stamp of the pre-registration is 2014-07-02 07:19 PM, so several months after the experiment was started and after 260 participants were tested. This is not necessarily an issue by itself but it should be mentioned for transparency.

Figures / results consistency

Most of the results in the text are expressed as BF01 but the graphs display BF10. Consistency among the 2 would be good.

Minor comments.

In some places the authors use the word “power” where “sample size” would be more appropriate : eg. line 323.

Reviewer #2: This is a nice and detailed study. The authors undertook a careful exercise to test the replicability of previous findings from a retroactive avoidance effects study, now with a significantly larger sample size, and a well thought out Bayesian sequential testing approach. The authors relied on the Bayes Factor (BF), clearly the best index for evidence detection in the Bayesian world (if the BF can be efficiently computed, at all). The scientific (statistical) premise looks adequate. In the end, they found that I do have some comments, and clarifications.

1. With the purpose of reaching a wider audience, the writing style is pretty verbose. Maybe it can be made more crisp?

2. I inderstand the sequential approach may not start with reaching a fixed sample size. But, what's the effect size we are looking into (not clearly specified)? The writeup, on page 13, directly goes into the prior specification of $\\delta$ to be Cauchy; some argument is necessary for the reader on the context behind that, and citing some literature is not enough.

3. In the "variation study" across labs in various countries, the sample size of Germany is considerably higher than the others. With more data, there is the possibility that it will influence the posterior. I don't see any discussion in this regard while explaining Table 1.

4. Manuscript needs to be checked for inconsistencies in spelling, such as "Psychical" research, should be "Physical".

Reviewer #3: Overall, the manuscript presents rigorously collected data and provides good context for the project, including relevant methodological critiques. The authors engaged with these critiques and took appropriate steps to address them, such that the present project has several methodological advantages (pre-registration, open data, and a large multi-national dataset). The project presents a challenge in that many researchers (myself included) would be skeptical a priori of the retroactive avoidance hypothesis, and admittedly I am no expert in many of the topics being presented (quantum mechanics, etc.). However, I followed the developments surrounding the Bem paper quite closely, and overwhelming data could force a re-examination of those beliefs.

A major selling point of the current project, then, is being pre-registered and presenting open data (although, I think a Registered Report – guaranteeing publication regardless of outcome and thus eliminating publication bias -- might be necessary for such a controversial topic). The pre-registration does a good job laying out the procedure, methods, and decision rules for the analysis. It goes a long way to constrain researcher degrees of freedom, but there remain a few places relating to the analysis and treatment of data that could be further constrained in future efforts. For example, how will the 60 responses per participant be handled? What is the specific statistical test being used? Are there any provisions for missing data or exclusions? Will data from all labs be combined into one and analyzed together, or will meta-analysis or similar be used? These dimensions all introduce possible flexibility in data analysis; that said, for the present project I don't see this as a big concern because the reported analyses follow pretty straightforwardly from what is described in the pre-reg.

The procedures and results are well described in the manuscript, however I think it’s very important to upload the files themselves to the OSF for reproducibility and verification (the eprime files, and syntax for the analysis if available). Because the data collection stopped earlier than planned, the resulting estimates leave perhaps some room for debate. However, from my perspective this is quite compelling evidence for an overall null effect and I’m not sure collecting additional participants would be warranted. I also think variation between sites would be conveyed better through a meta-analysis (P 15), instead of eye-balling the results from different sites. This would give you metrics for heterogeneity (tau, Q, and I^2) indicating the level of heterogeneity between sites. But, this is not too central to the paper and likely would result in the same conclusion (e.g., I suspect you’d get heterogeneity estimates that don’t exceed chance). Overall, I found the analysis and presentation of the confirmatory results quite solid.

Perhaps the area I would critique is the exploratory analyses and the logic underlying the temporal change hypothesis (P 16). To me, the logic behind this argument does not seem very compelling. It seems entropy would have to have “memory” to correct the initial violation of thermodynamics with a second violation of the same law in the opposite direction. This bears a surface level similarity to the gambler’s fallacy, although perhaps I’m not understanding on a deep level. In practice, then, participants in earlier stages of the experiment would show retroactive avoidance, whereas participants in later stages of the experiment (or in replications) would be showing the opposite pattern? I realize this is a hypothesis generated since the project began, but if this is the case you could find robust evidence for this retroactive avoidance effect by instead running a Registered Report using a novel paradigm. Regardless, as the authors note this is quite a challenging hypothesis in terms of the implications for current scientific thinking and methods, and would go against much of our current understanding, so I think there would need to be pretty compelling evidence to validate it. But, I’m also not sure it’s my place to be critiquing these explanations or deeming them plausible or implausible given my lack of expertise in the topics being discussed.

I’ll propose that it seems much more plausible that the observed pattern of significant original studies followed by replication failures could be produced if there is zero retroactive avoidance effect in the population. The original significant observations were possibly due to some combination of chance occurrence, flexibility in data analysis, and publication/file drawer bias (note: I think it’s quite easy for all of these things to occur even with the best intentions unless you specifically safeguard against them – which is why it’s so important that the authors pre-registered the present project) – and the replication failures are simply correctly identifying a true null effect. This explanation has, in general, been demonstrated as quite plausible in simulations (e.g., Simmons et al., 2011), and indeed is a major driving force in terms of the current overhaul in methods in social psychology and other fields. The authors acknowledge this possibility, and pre-registered the present project to safeguard against this.

Then, admittedly I had a pretty high bar for evidence going into the exploratory analyses. The authors appropriately identified these analyses as exploratory, and provided suitable cautions about how readers should interpret them.

Exploratory analyses: (P18-20) With the data being arranged temporally, this high BF is found in the data from the Maier et al., study 4 data, that I understand was not pre-registered? This is, I think, what we would expect given that the earlier study reported a significant result. The problem is that we don't know if this original result may have suffered from some of the classic problems we're concerned about in non-preregistered projects (see above). We're assuming the data from the current replication do not suffer from those issues due to the pre-registration (but, note a Registered Report is the only real way to guarantee no publication bias).

Therefore I'm not sure if this analysis is too convincing for a temporal pattern, so much as a pattern where the pre-registered study found smaller (null) effects? There is some discussion of this issue in the RPP paper (OSC, 2015) and this seems more plausible from my perspective. This concern would apply to both Maximum BF and BF energy analyses, and likely also to the FFT analysis, although I'm not very familiar with that technique. Overall, then, I don’t find these exploratory analyses very convincing as to the presence of a temporal pattern.

Overall, I think these data are valuable and contribute to the discussion around the retroactive avoidance hypothesis. I applaud the authors for engaging with the methodological critiques and taking action (e.g., pre-registration and following advice from some critics) to address them in the present project. I think the confirmatory analyses are quite solid, and my criticisms mostly involve the exploratory hypotheses and analysis which I don’t find very compelling – but they are also clearly labelled as exploratory and secondary. I’ll leave a couple more minor notes at the end.

Other notes:

Note that the pre-reg was first created in September 2013, but it wasn’t actually registered (“frozen”) until July 2, 2014. It looks like data collection started in 2013 and ~260 participants were collected prior to that registration. I think the authors should clarify this point (e.g., why wasn’t it registered before data collection started, and were any data examined before the registration?)

Page 3. Please indicate what effect size this is (cohen's d? pearson correlation?), and the exact sample size for ease of interpretation.

Thanks,

Rick Klein

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Remi Gau

Reviewer #2: No

Reviewer #3: Yes: Rick Klein

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 31;15(8):e0238373. doi: 10.1371/journal.pone.0238373.r002

Author response to Decision Letter 0


11 May 2020

Dear Dr. Dr. Naudet,

dear Reviewers,

we would like to thank you for your valuable comments on our manuscript entitled „A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance” submitted for publication to PLOS ONE. We tried to carefully respond to every comment the editor and the three reviewers made. Below you will find our answers to your concerns and a description of how we adjusted the text. We also added some analyses.

We included the original comments in this letter. Please check our responses to every comment after **.

We are looking forward to your response.

Sincerely,

Markus Maier and co-authors

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

**OK.

2. Thank you for including your ethics statement: "All research presented in this article involved human participants, and the protocol was approved by the respective ethical boards of all participating universities.

Written consent was obtained from all participants."

a) Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study.

**We amended the full names from all boards that approved this study in the „Methods“ section.

b) Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

**OK, done.

Comments to the Author

5. Review Comments to the Author

Reviewer #1: In this article Maier et colleagues attempt a direct replication of a pre-cognition study published in 2014.

This study tries to establish if the subliminal presentation of a picture with emotionally negative content presented at a given time can reduce participants’ responses to select which picture (negative VS neutral) to view EVEN BEFORE it has been decided which “key press” will be associated with which type of picture.

This work involved 5 different labs and is based on a large sample of participants acquired over several years.

This study was preregistered and the results contain a fairly clear distinction between the confirmatory part (the replication proper) and its exploratory part.

The confirmatory analysis is based on the use Bayes factor to measure of the relative strength of the evidence in favor of the null hypothesis (future presentation of negative stimuli have no effect on the choice of which key will be associated to which stimulus) compared to the alternative hypothesis (future presentation of negative stimuli reduce the chance that the “selected key” will be associated with a negative stimulus – and thus we should lead to lower than chance proportion of aversive stimuli being presented). The prediction being directional the authors use a directional Bayesian one-sample t-test.

Data acquisition was stopped before reaching the pre-registered thresholds because of lack of resources. Based on the available data and some post-hoc analysis the authors conclude that a true null effect was detected.

The exploratory part of the analysis relies on trying to find anomalies in the time series generated by the sequential analysis Bayes factor and by relying on a frequentist approach and using simulations to generate the null distribution.

Part of the data presented is available on OSF.

In general the manuscript is fairly clear, and I generally agree with the results of the confirmatory analysis. The exploratory part presents one major issue (lack of correction for multiple comparison) that should be fixed (see the exploratory section below). I also have some suggestions to make both on the form and the content of the manuscript.

MAIN ISSUES

- lack of correction for multiple comparison in the exploratory analysis

- too much emphasis put on the very speculative theoretical underpinnings of the work

Exploratory analysis

In this part the authors apply what amounts to a frequentist approach to detect surprising results. They rely on simulations to produce the null distribution against which to test their data.

I suggest that the authors mention what is the statistical threshold they use to establish significance? It seems to be 5% but I do not see it mentioned in the manuscript.

**We added a corresponding statement in the result section (line 393).

I suspect that the 2 first exploratory analysis done by the authors are partly redundant and do not provide independent lines of evidence. Curves with higher Bayes factor will depart further from the BF=1 line and hence have a higher area. I recommend that to better visualize this, authors use a scatter plot of the simulations for “maximum BF VS area” and to display where the data from this study actually lie in this scatter plot.

**This is true. These first two analyses are not completely independent as reviewer 1 outlined above. However, ocassionally it occurs that some time series do not exhibit extreme BFs but do reach a high level of energy and vice versa. To cover these, both analyses have been provided. As suggested, we added a scatter plot (see Figure 4) to show how both methods are related to each other and to locate the human data time series within the cloud of simulated time series. Also, in the combined score the human data are outstanding (p = .0104; that is, only 104 of 10,000 simulations reach the same or exceed the combined score).

The main issue in this section lies in the 3rd analysis and to a similar extend.

The 3rd analysis tests 1164 frequencies and in consequence, unless I missed something, shows a need for multiple comparison correction. However I am unclear as to how the authors corrected for this? Without this the risk for false positive might be clearly inflated. Because of the smoothness of the data, 2 neighboring frequencies are unlikely to be independent data points, so I suspect that applying a Bonferroni correction with a significance threshold of 0.000042955 (0.05 / 1164) will be deemed over conservative. If so I suggest that the authors look for inspiration into the statistical methods literature related to EEG to maybe try to find ways on how to best correct their tests.

**In our view, alpha inflation is not a relevant issue in the time series analyses since the human data and each of the simulations is treated in the same fashion and thus contain the same amout of potential inflation. If we compare a potentially inflationally biased human data test set with a distribution of equally treated simulation data test sets, this null distribution also contains the inflation bias within its distribution. If our human data are found at the extreme end of this null distribution it must be significantly different from the simulations and thus from chance by controlling for inflation bias (or keeping it constant).

**Let us explain this for the FFT analysis (but our line of reasoning applies in the same way to the other two analyses as well). The FFT calculates the respective amplitude of serveral frequencies for a given human data set. In a first step, these amplitudes are tested for being extreme by comparing the amplitudes of the respective frequencies with all amplitudes of the same frequencies obtained from the simulations that were also analysed by the FFT. Some of the frequencies in the human data are then declared significant, a certain proportion of those by chance due to the many frequencies that have been tested this way and thus due to the inflation bias. In a second step, 1,000 additional simulations are now individually tested against all the others in the same way, obtaining a number of significant frequencies for each of the 1,000 simulations (also very likely inflated due to the number of tests performed). By combining all 1,000 simulation results, the (inflated) numbers of significant frequencies for all the simulated data sets now constitute the null disctribution for the final crucial test. If the number of significant frequencies found within the human data time series is at the extreme end of this distribution (that also controlls for the inflation bias factor), a significant oscillation pattern was found.

**Although inflation bias is not a problem as outlined above, to circumvent the multiple testing approach in the FFT analysis described, we propose a different and more elegant method for comparing the FFT results from the human data with the FFTs from the 10,000 simulated data sets:

**To test the FFT results from the human data against chance occurrence, all 1,164 amplitudes obtained from the FFT of the human data set were added up creating a sum score of the amplitudes obtained from all tested frequencies of this set. In the same way for each of the 10,000 simulations the sum score of amplitudes was computed. The distribution of the sum scores of amplitudes across all simulations now served as the null distribution (see Figure 5). The sum score of amplitudes of the human data set was Sumamp = 34.84. Only 147 of the simulations (1.47%) reached a sum score of 34.84 or higher. We added this new analysis at the end of the “Exploratory analysis” section and replaced the old one.

The first exploratory analysis suffers from a similar problem even though it is not as obvious. By drawing a 95 confidence interval and looking for data points in the BF time series that “stand out”, the authors are implicitly running a statistical test at every time point so more than 2000 statistical tests: this too requires a correction for multiple comparison. Here again the comments made in the previous paragraph regarding data smoothness and finding the right level of correction apply, but they are complicated by the fact that we are dealing with a time series and that autocorrelation might also be an issue.

**We do not think that this argument is correct. We are not testing every BF obtained in the human data sequential analyses against the corresponding BFs from the simulations. Rather we identify the single highest one and compare it with a distribution of single highest BFs found in the simulation analyses. Thus, no multiple testing is involved here.

**But in case we missed something here, our theoretical argument against the problem of inflation would apply here as well (see our response above).

On a different note, I am wondering why the author did not include an analysis similar to the harmonic oscillation approach used by Dechamps and Maier (2019)?

***The harmonic oscillation approach was critised in a comment by Hartmut Grote (Frontiers in Psychology, Section Cognition, 2019). The curve estimation allows for too many degrees of freedom on the side of the analyst and is therefore not a sufficiently enough objective method to determine the oscillative structure of a time series. Also, the significance testing we used was insufficient.

***We therefore looked for more adequate methods to explore oscillation patterns in time sequences. And apparently, the best method for analyzing discrete data of the kind we collected in our research is the FFT. It was developed for exactly that purpose.

Comments on theoretical underpinning

In general I would suggest that the authors spend a lot less time on the theoretical speculations and inspirations behind this work.

**We agree. We deleted large parts of the theoretical background desciption and also deleted redundancies.

I think that the authors are putting the cart before the horse, and trying to link their work to quantum mechanics. It is very premature to do so before they have even ascertained the existence of the phenomenon that they are trying to explain. I would strongly suggest that the authors move a lot of the links made to quantum mechanics (introduction line 71 to 118 ; results line 331 to 358) to the discussion section.

**See response above. We reduced the theoretical background in the intro to a minimum. We just wanted to make sure that the reader understands why we were focusing on unconscious processing.

**At the beginning of the „Exploratory analyses“ we had to describe the theoretical model that led to the proposition of an oscillating effect. We therefore kept part of the theory in place but deleted any redundancies. Otherwise the reader would not understand why we suddenly performed these post hoc analyses.

I took the liberty to consult a physicist colleague on the parts of the manuscript that were outside of my area of expertise. I am adding below some of her comments that I suggest the authors should take into account.

- l. 83 : Non locality is only ever spatial. There is no reason for it to imply a temporal bi-directionality and this idea does not appear the mentioned references (14, 24, 26).

**Indeed 25 and 26 do not refer to temporal locality but to spatial non-locality, which was mentioned a few words earlier. We put 25 and 26 to that part of the sentence. 14 does refer to temporal non-locality and we added some more references [see 27-29]:

**Tressoldi, P. E., Maier, M. A., Buechner, V. L., & Khrennikov, A. (2015). A macroscopic violation of no-signaling in time inequalities? How to test temporal entanglement with behavioral observables. Frontiers in Psychology, 6, 1061.

**Aharonov, Y., Cohen, E., & Elitzur, A. C. (2015). Can a future choice affect a past measurement’s outcome?. Annals of Physics, 355, 258-268.

**Megidish, E., Halevy, A., Shacham, T., Dvir, T., Dovrat, L., & Eisenberg, H. S. (2013). Entanglement swapping between photons that have never coexisted. Physical Review Letters, 110(21), 210-403.

**The physicist colleague is not completely right. Bell‘s original work indeed adressed spatial non-locality (maybe the colleague is refering to this groundbreaking work) but this was later extended to temporal non-locality (see references above).

- l 85 : The cited reference does not correspond to what is mentioned in this assertion. The Emperor's New Mind is a book comparing human mathematical reasoning and turing machines and has not link with temporal bidirectionality.

**It also addresses quantum mechanics, the measurent problem and the arrow of time and their mutual relationship.

- l. 347 : The author seem to view entropy as a force that acts in nature to counteract specific effects where as it is in fact a descriptive mathematical quantity. Entropy does not exist as such, but is a sort of indicator invented to summarize several factors that participate to a certain level of disorder of a set of molecules [...]. In a closed system, the level of disorder always increases (and along the entropy that quatifies it), and this is due to the low number of ordered configuration that therefore have a low probability, this is not due to some property of the bodies or the motions involved that would be called entropy and that would condition the possible configuration. The entropy represents the increase in disorder, it does not cause it. Similarly, the increase in disorder is a sign of time passing by [...], but this increase in disorder does not cause the passing of time nor does it condition it.

**This is a good point. We treated entropy like a force which is actually wrong. What we rather wanted to say is that an unknown information-related law might underlie the no-signal theorem that leads to the oscillation pattern. We deleted the entropy speculation from the manuscript and revised the corresponding statements accordingly.

**On a side note: Penrose actually does argue that an increase in disorder causes the passing of time.

Penrose himself recognize that his ideas on the nature of consciouness are speculative and his ideas are considered as wrong by several experts in several fields. Moreover Orch-OR is only related to a a potential of quantum mechanics in the emergence of consciousness and does not bear any link to temporal bidirectionality.

**It explicitely does. Most straightforwardly in:

**Hameroff S. How quantum brain biology can rescue conscious free will. Frontiers in Integrative Neuroscience. 2012; 6(1): 93. https://doi.org/10.3389/fnint.2012.00093

**Hameroff is a co-author of the Orch OR model.

The GQT does not have any link with temporal bidirectionality either. And calling it a theory is premature given that it is at this stage a mere hypothesis formulated by a small group physicists, untested and unknown from the physicists community. It is also incomplete and unable to account for many of the the fundamental aspects of quantum mechanics, as admitted by the CGT defenders themselves. It is at this stage very premature to base new research on it.

***Römer and von Lucadou both stated in personal communications that the GQT does provide links to temporal bidirectionality. The key construct in GQT is non-local entanglement correlations that also include temporal non-locality.

OTHER ISSUES

Data

Using the data on OSF and JASP, I was partly able to reproduce some of the results of the confirmatory part.

The lack of data dictionary explaining what each column in the data left me wondering why I was not getting the same results as the authors until I realized that number reported in the data was the number of neutral pictures presented to each participants.

Similarly I was not able to easily reproduce all the results from the “Variations across Labs” section. This mostly stems from the way the data was coded, in which there was not a uniform way to know which data point comes from which lab.

Similarly it seems that different labs had different file naming conventions and time stamping of their data files. I would suggest that an extra variable is added to keep track of the overall order in which data samples were acquired, especially given the time series aspect of the data seemed important for the exploratory analysis that the authors suggest.

***Good point. A time stamp variable has been added.

Similarly the filenames of some samples seemed to have been copies from other files or possibly when moving files between folders ( back2neg_neu_deutsch Version2-524-1 (2).txt ) so I would suggest that add some code to their analysis pipeline to quality check and consolidate the dataset to remove the possibility of the inclusion of any duplicates.

***We checked all files. They were no dublicates. In case the same subject code was mistakenly used twice for different subjects, eprime just added a „(2)“ to the eprime result file name in order not to overwrite the original result file.

Concerning the exploratory part, I would suggest that the authors also share the data resulting of the 10 000 simulated datasets to allow for a computational reproduction of the authors’ results.

**OK, we added the 10,000 simulated data sets to the OSF.

Similarly, and I might be mistaken about, but I think that the ALL the 327 data points from Maier 2014 are actually accessible and not just the 324 as reported in the manuscript. I was able to download them by using the information actually provided in Maier 2014 and get a plot of the sequential analysis similar to the figure 3 of the manuscript. In order to avoid future readers to have to do this bit of data archaeology, I would suggest that the the results from Maier 2014 are either mentioned (with a link) on the OSF project or, better, added on the OSF project related to this paper (if only because long shelf life of things put on OSF compared to that of some institutional or personal repository).

**OK, this can be done. We added the raw data to the OSF.

**We need to emphasize that the data from these three participants cannot be used for the temporal analyses described in our mansucript here. The problem with the three corrupt eprime_result files is that we do not have time stamps for these three participants. Although we do have their mean scores -as the reviewer correctly states- we cannot place them at the correct (!) temporal position on which their data have been assessed (the Maier et al. 2014 subject numbers do not reflect the exact temporal order of data collection). Since our main argument concering the temporal analyses is based on the exact temporal ordering of the participants‘ data during data collection, we decided to omit these three data points from the temporal analyses. Given the sample size, we do not think that including these data (even if it were possible) would make a remarkable difference.

Code

For the exploratory part I would suggest that the authors make available the R code used to analyze the data and generate their figures, not only for transparency, but for other future researchers who might want to further explore those results to evaluate whether they are worth investing their time and effort to replicate.

Similarly and to simplify the work of future replications, I would suggest that the authors share the E-prime and jsPsych code to their experiment.

**OK , done.

Pre-registration

I noted some deviations between the pre-registration and the manuscript. I would suggest that the authors cross-check the content and flag any deviations they might spot with the reason behind them. The authors can take some inspiration from the SMART preregistration format: https://osf.io/6vhyt/

From what I can see it seems that the multi-site nature of this experiment was decided after the pre-registration. I suggest the authors give more details on what measures were taken to ensure the same procedure were applied in all laboratories.

**Indeed that prereg does not specify the study as a multi-site experiment (but it does also not exclude this possibility). We made a corresponding statement in the text (line 125).

**During the course of data collection we realized that it would have taken too much time to run this study purely on our own. We therefore invited colleagues who might be interested in this project to help us with data collection in their labs.

**Through extensive personal (phone or meetings) and email communication we made sure that the procedure was exactly the same as the one performed in our lab. We also sent a written instruction for experiementers to our collaborators to make sure that they closely followed our instructions. The instruction sheet is now uploaded at OSF.

**We made a few corresponding statements at the end of the introduction section that describe our efforts to ensure that the same procedure was applied in all participating labs.

**We think extending data collection to a multi-lab project rather stengthens the scientifc value of the study rather than weakening it. We do not consider this shift in data collection strategy a deviation from the original preregistration.

**Despite slight adjustments of the presentations times chosen (see below), we are not aware of any other deviations from the specifications made in the preregistration.

The stimuli and mask presentation time are slightly different between the pre-registration and the manuscript. Is this because the refresh rate of the monitor was different?

**Yes, this was the reason. We are convinced that these adjustments are not responsible for the null effect (and nobody would argue this way).

Also, as a side note and because short presentation time seem important in the current study as the stimuli should be presented subliminally, stimuli presentation timing are notoriously inaccurate on Windows machines and the presentation time reported by the experimental softwares should rarely be taken at face value. I would strongly urge the authors that for future works they externally measure with a photo-diode and report the actual timing accuracy they can get on their set-up.

**Yes, this is a very good point and should also be adressed before any preregistrations in future projects.

The robustness checks of the results using different priors and comparing the results across labs were not part of the original pre-registration.

**Yes, and we clearly labeled them as additional (i.e., non confirmatory) analyses. We consider them to add some important information about how to interpret the main finding.

So in general I would suggest that the confirmatory and exploratory parts of the results sections be made even more clear:

1. “Main analysis” should be named “Confirmatory analysis”

Report the pre-registered analysis

1.1. Additional analysis

Report robustness and across labs comparisons

2. “Temporal change across time” should be named “exploratory analysis”

**We followed this advice and used the proposed headlines in the revised manuscript.

The actual time stamp of the pre-registration is 2014-07-02 07:19 PM, so several months after the experiment was started and after 260 participants were tested. This is not necessarily an issue by itself but it should be mentioned for transparency.

**Done. We added the following statement:

“It has to be noted that the text of the preregistration has been created and stored before the beginning of the data collection (in September 2013), however the first author was not aware of the fact that it also had to be frozen to complete the procedure. This was done a few months later (in July 2014) without changing anything of the original text. So basically the preregistration was fully finalized after data from 260 participants had already been collected and were inspected for the first time. In addition, the study originally was not explicitely preregistered as a multi lab project (but did not exclude this option either).”

Figures / results consistency

Most of the results in the text are expressed as BF01 but the graphs display BF10. Consistency among the 2 would be good.

**We checked for consistency and adjusted the text when needed.

Minor comments.

In some places the authors use the word “power” where “sample size” would be more appropriate : eg. line 318.

**We changed the wording as suggested.

Reviewer #2: This is a nice and detailed study. The authors undertook a careful exercise to test the replicability of previous findings from a retroactive avoidance effects study, now with a significantly larger sample size, and a well thought out Bayesian sequential testing approach. The authors relied on the Bayes Factor (BF), clearly the best index for evidence detection in the Bayesian world (if the BF can be efficiently computed, at all). The scientific (statistical) premise looks adequate. In the end, they found that I do have some comments, and clarifications.

1. With the purpose of reaching a wider audience, the writing style is pretty verbose. Maybe it can be made more crisp?

**We agree and deleted large portions of the text in the introduction. We also checked for redundancies and deleted them (see our reponse to a similar request from reviewer 1).

2. I inderstand the sequential approach may not start with reaching a fixed sample size. But, what's the effect size we are looking into (not clearly specified)? The writeup, on page 13, directly goes into the prior specification of $\\delta$ to be Cauchy; some argument is necessary for the reader on the context behind that, and citing some literature is not enough.

** The effect size of the original study, that we intended to replicate, was dcohen = .1. The specification of the Cauchy distribution used as prior in the replication attempt was based on this initial effect size. We mention this now in the revision (lines 272-276).

3. In the "variation study" across labs in various countries, the sample size of Germany is considerably higher than the others. With more data, there is the possibility that it will influence the posterior. I don't see any discussion in this regard while explaining Table 1.

**We added a corresponding statement at the end of the “Additional analysis“ section.

4. Manuscript needs to be checked for inconsistencies in spelling, such as "Psychical" research, should be "Physical".

**Could not find the term psychical in the text.

Reviewer #3: Overall, the manuscript presents rigorously collected data and provides good context for the project, including relevant methodological critiques. The authors engaged with these critiques and took appropriate steps to address them, such that the present project has several methodological advantages (pre-registration, open data, and a large multi-national dataset). The project presents a challenge in that many researchers (myself included) would be skeptical a priori of the retroactive avoidance hypothesis, and admittedly I am no expert in many of the topics being presented (quantum mechanics, etc.). However, I followed the developments surrounding the Bem paper quite closely, and overwhelming data could force a re-examination of those beliefs.

A major selling point of the current project, then, is being pre-registered and presenting open data (although, I think a Registered Report – guaranteeing publication regardless of outcome and thus eliminating publication bias -- might be necessary for such a controversial topic). The pre-registration does a good job laying out the procedure, methods, and decision rules for the analysis. It goes a long way to constrain researcher degrees of freedom, but there remain a few places relating to the analysis and treatment of data that could be further constrained in future efforts. For example, how will the 60 responses per participant be handled? What is the specific statistical test being used? Are there any provisions for missing data or exclusions? Will data from all labs be combined into one and analyzed together, or will meta-analysis or similar be used? These dimensions all introduce possible flexibility in data analysis; that said, for the present project I don't see this as a big concern because the reported analyses follow pretty straightforwardly from what is described in the pre-reg.

**How the 60 trials per participant were handled followed the exact protocol of the original study. That is, a mean score for each participant was computed and subjected to the Bayesian one sample t-test analysis in a sequential way. We added a corresponding statement at the beginning of the “Main analysis“ section.

**We did not specify any exclusion criteria other than basic vision abilities and age of participants being at least 18 years. We added a corresponding statement in the “Participants“ section.

**From the Baysian analysis procedure it followed that any newly collected data would be transfered into the sequential BF analysis in the temporal order in which the data were collected. No meta-analytic methods were planned.

The procedures and results are well described in the manuscript, however I think it’s very important to upload the files themselves to the OSF for reproducibility and verification (the eprime files, and syntax for the analysis if available). Because the data collection stopped earlier than planned, the resulting estimates leave perhaps some room for debate. However, from my perspective this is quite compelling evidence for an overall null effect and I’m not sure collecting additional participants would be warranted. I also think variation between sites would be conveyed better through a meta-analysis (P 15), instead of eye-balling the results from different sites. This would give you metrics for heterogeneity (tau, Q, and I^2) indicating the level of heterogeneity between sites. But, this is not too central to the paper and likely would result in the same conclusion (e.g., I suspect you’d get heterogeneity estimates that don’t exceed chance). Overall, I found the analysis and presentation of the confirmatory results quite solid.

**Good point. We added the files and documents to the OSF.

**We also calculated a meta-analysis (random-effects model) across labs. Indeed the heterogeneity estimates did not exceed chance. We added a corresponding paragraph in the “Variations across Labs“ section (a fixed-effects model produced the same result).

Perhaps the area I would critique is the exploratory analyses and the logic underlying the temporal change hypothesis (P 16). To me, the logic behind this argument does not seem very compelling. It seems entropy would have to have “memory” to correct the initial violation of thermodynamics with a second violation of the same law in the opposite direction. This bears a surface level similarity to the gambler’s fallacy, although perhaps I’m not understanding on a deep level. In practice, then, participants in earlier stages of the experiment would show retroactive avoidance, whereas participants in later stages of the experiment (or in replications) would be showing the opposite pattern? I realize this is a hypothesis generated since the project began, but if this is the case you could find robust evidence for this retroactive avoidance effect by instead running a Registered Report using a novel paradigm. Regardless, as the authors note this is quite a challenging hypothesis in terms of the implications for current scientific thinking and methods, and would go against much of our current understanding, so I think there would need to be pretty compelling evidence to validate it. But, I’m also not sure it’s my place to be critiquing these explanations or deeming them plausible or implausible given my lack of expertise in the topics being discussed.

I’ll propose that it seems much more plausible that the observed pattern of significant original studies followed by replication failures could be produced if there is zero retroactive avoidance effect in the population. The original significant observations were possibly due to some combination of chance occurrence, flexibility in data analysis, and publication/file drawer bias (note: I think it’s quite easy for all of these things to occur even with the best intentions unless you specifically safeguard against them – which is why it’s so important that the authors pre-registered the present project) – and the replication failures are simply correctly identifying a true null effect. This explanation has, in general, been demonstrated as quite plausible in simulations (e.g., Simmons et al., 2011), and indeed is a major driving force in terms of the current overhaul in methods in social psychology and other fields. The authors acknowledge this possibility, and pre-registered the present project to safeguard against this.

**We agree, this is a highly relevant counterargument/alternative interpretation of this finding. We added this statement into the discussion session (which can be credited to reviewer 3 if he so desires).

Then, admittedly I had a pretty high bar for evidence going into the exploratory analyses. The authors appropriately identified these analyses as exploratory, and provided suitable cautions about how readers should interpret them.

Exploratory analyses: (P18-20) With the data being arranged temporally, this high BF is found in the data from the Maier et al., study 4 data, that I understand was not pre-registered? This is, I think, what we would expect given that the earlier study reported a significant result. The problem is that we don't know if this original result may have suffered from some of the classic problems we're concerned about in non-preregistered projects (see above). We're assuming the data from the current replication do not suffer from those issues due to the pre-registration (but, note a Registered Report is the only real way to guarantee no publication bias).

Therefore I'm not sure if this analysis is too convincing for a temporal pattern, so much as a pattern where the pre-registered study found smaller (null) effects? There is some discussion of this issue in the RPP paper (OSC, 2015) and this seems more plausible from my perspective. This concern would apply to both Maximum BF and BF energy analyses, and likely also to the FFT analysis, although I'm not very familiar with that technique. Overall, then, I don’t find these exploratory analyses very convincing as to the presence of a temporal pattern.

**Good point, see our response above.

Overall, I think these data are valuable and contribute to the discussion around the retroactive avoidance hypothesis. I applaud the authors for engaging with the methodological critiques and taking action (e.g., pre-registration and following advice from some critics) to address them in the present project. I think the confirmatory analyses are quite solid, and my criticisms mostly involve the exploratory hypotheses and analysis which I don’t find very compelling – but they are also clearly labelled as exploratory and secondary. I’ll leave a couple more minor notes at the end.

Other notes:

Note that the pre-reg was first created in September 2013, but it wasn’t actually registered (“frozen”) until July 2, 2014. It looks like data collection started in 2013 and ~260 participants were collected prior to that registration. I think the authors should clarify this point (e.g., why wasn’t it registered before data collection started, and were any data examined before the registration?)

**We explain this inconsistency in the text:

“It has to be noted that the text of the preregistration had been created and stored before the beginning of the data collection (in September 2013), however the first author was not aware of the fact that it also had to be frozen to complete the procedure. This was done a few months later (in July 2014) without changing anything of the original text. So, basically, the preregistration was finalized after data from 260 participants have already been collected and were inspected for the first time. In addition, the study originally was not explicitely preregistered as a multi lab project (but did not exclude this option either).”

Page 3. Please indicate what effect size this is (cohen's d? pearson correlation?), and the exact sample size for ease of interpretation.

**It was cohen’s d = .1. Based on the 327 subjects of the original study. We mention this now in the “Results“ section.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Florian Naudet

15 Jun 2020

PONE-D-20-02005R1

A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance

PLOS ONE

Dear Dr. Maier,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Thank you for your good job in revising the paper. Many thanks to the 4 reviewers who assessed this new draft. As you will see, there are minor changes to be made as suggested by "reviewer 1". Please take these comments into account. 

Please submit your revised manuscript by Jul 30 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Florian Naudet, M.D., M.P.H., Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: First I would like to apologize for the time it took me to do this second round of review: this should have been much faster.

I would like to thank the authors for their clarifications, all the reworking and additions to the manuscript. It makes it clearer where the line between results and interpretation lie.

Updating the code and the data (from this and the previous study) on OSF have also clear added values.

My two main suggestions are:

display for figure 3 the null distribution on which the statistical inference is performed

add a readme to the OSF project and ensure that the code runs as intended when downloaded from source

---

**Although inflation bias is not a problem as outlined above, to circumvent the multiple

testing approach in the FFT analysis described, we propose a different and more

elegant method for comparing the FFT results from the human data with the FFTs from

the 10,000 simulated data sets:

**To test the FFT results from the human data against chance occurrence, all 1,164

amplitudes obtained from the FFT of the human data set were added up creating a

sum score of the amplitudes obtained from all tested frequencies of this set. In the

same way for each of the 10,000 simulations the sum score of amplitudes was

computed. The distribution of the sum scores of amplitudes across all simulations now

served as the null distribution (see Figure 5). The sum score of amplitudes of the

human data set was Sumamp = 34.84. Only 147 of the simulations (1.47%) reached a

sum score of 34.84 or higher. We added this new analysis at the end of the

“Exploratory analysis” section and replaced the old one.

**We are not testing every BF obtained in the human data sequential analyses against the corresponding BFs from the simulations. Rather we identify the single highest one and compare it with a distribution of single highest BFs found in the simulation analyses. Thus, no multiple testing is involved here.

**But in case we missed something here, our theoretical argument against the problem

of inflation would apply here as well (see our response above).

---

I thank the authors for explaining this more thoroughly. This helped me understand why I got confused and thought there was a multiple comparison problem. The text did indeed mention what null distribution was used to make the statistical inference. However the figure of the previous version of the article showed a 95% confidence interval envelope that a) was unrelated to the inference made in the text, b) would actually suffer from multiple comparison problem if it were used as a way to threshold and detect extreme data.

The new approach suggested by the author is an improvement as it is more direct and the link between the figure and the text is much clearer.

In the same spirit I would suggest that :

the authors make the text and the figure 3 more congruent and add a figure (or maybe an inset to the figure 3) similar figure 5 showing where the maximum BF values of the human data lies when compared to the distribution of maximums obtained from the simulation.

Also maybe I am still missing something but if only the maximum BF value is taken for the inference how can **other** maximum BF be found in time series as suggested by this sentence:

“Higher BFs were also found with participants numbered 500 to 600 (i.e., 400 in the early stages of the replication study).”

---

The stimuli and mask presentation time are slightly different between the pre-registration and the manuscript. Is this because the refresh rate of the monitor was different?

**Yes, this was the reason. We are convinced that these adjustments are not

responsible for the null effect (and nobody would argue this way).

---

Indeed I was not arguing that way. I was more trying to understand where some of the differences were coming from.

Code

But I admit that the lack of a README did not help in figuring out:

what packages are needed

where unzipped files should go

in which order the scripts should be run as there seem to be some dependencies between them.

I suggest that such basic documentation should be provided.

I was unable to make some of the R code run (which might be partly due to my lack of R knowledge) and in ‘exploratoryanalyses.R’

Error in rbind(deparse.level, ...) :

numbers of columns of arguments do not match

Could the authors make sure that everything runs as intended from simply downloading the OSF project and trying to run it from there? This is usually even better when performed by a colleague “naive” to the study to ensure a bare minimum of portability to make sure the one’s code does not just run on one’s machine.

On that note, I would like to point to the authors (and the editors) this recent initiative that makes sure that submitted code runs as intended: https://codecheck.org.uk/

I am appending below additional comments and suggestions of my physicist colleague.

Quantum theory

l.74 : « group of theories » should be replaced by « group of interpretations » (which don’t have the level of certainty of a theory)

GQT is an interpretation, even if it has the arrogance to call itself a theory, without any experiments to support its hypotheses. It doesn’t have the consensual character of a theory, it is only supported by a few physicists and only based on interpretation from the quantum model.

OrchOR is a philosophy of mind, also based on interpretation and extrapolation from the quantum model to the mind, without any supporting experiments or direct observations. Thus, at l.76, “theory” should be replaced by “philosophy of mind”.

In my opinion, this is not the kind of foundations that one should build assumptions in other fields upon, so I would suggest that you don’t mention quantum mechanics at all. But if you really want to mention it, it should be clear in your formulation that « if these interpretations are true », then there are similarities with your subject of experiment that would indicate that there could possibly be a causal link between the two hypothetical phenomena. In addition, it should also be very clear that these interpretations are (markedly) minor ones and still debated amongst all the others that have been put forward about the quantum model.

Otherwise, there is a high risk that your article will be automatically considered by any physicist who would read it as deliberately deforming the current consensus in physics in order to match its own conclusions, which would be regrettable if physics and psychology are as interdependent as your article would suggest, because in this case this research field could only benefit from collaboration with physicists.

I understand better now what you mean by « temporal non-locality » and I was aware of the kind of phenomenon that it stands for in your article (which appears in delayed-choice experiments). The formulation is a little confusing, because « locality » etymologically refers to a « location » (in space) so maybe it would be clearer to call it « non-temporality » (this is not an official denomination). I’ve found two or three articles that use the same denomination as you do but a few others use « temporal Bell inequalities » or « temporal entanglement », actually it doesn’t have an official name because it’s only an aspect of the intrication phenomenon in general.

The real experimental effect is called « delayed-choice experiments », but the choice to consider that there is a new physical phenomenon (« temporal non-locality » or else) underlying this effect depends on the interpretation. Notably in the standard interpretation it’s just an effect of observation, and in GQT it will be the same because of the role of consciousness in the observation (basically put : in delayed-choice experiments, there are unobservable events which happen before the observation that are correlated to the choice made by the observer while doing the observation later. But in GQT, only the consciousness is capable of producing a measurement, so nothing unobservable ever happens, so there is no correlation, everything is only determined when the observer arrives). This is the only interpretation of quantum mechanics that you refer to (since OrchOR is not really an interpretation of quantum mechanics but rather an application of it to the philosophy of mind). Be careful then because the effect you refer to with « temporal non-locality » doesn’t exist in the interpretation that you refer to. I would suggest that you choose to mention only one of these ideas (non-temporality or GQT) for consistency’s sake. If you’re looking for an interpretation that would consider the delayed-choice experiments as a manifestation of time bidirectionality, there is the transactional interpretation, but this one doesn’t make any reference to consciousness so it cannot be linked to the rest of your work.

I would consider the reference to non-temporality (or “temporal non-locality”) as the weakest part because if this effect is real it would typically be destroyed very quickly by decoherence (the statistical process that makes quantum effects very unprobable and unstable at large scales and usual temperatures), so it would be very unprobable that it could be maintained through a whole room full of air in order to carry retrocausal information all the way from the stimuli to the subject’s brain.

l.84 : « implies » should be replaced by « could be explained by the hypothesis that », because it’s not a necessary implication. As I mentioned, the delayed choice experiments can be interpreted in several ways (I'm sorry for the lack of clarity of this field of physics, we're working on it). It’s also not clear what affirmation the sources 14,27,28,29 are supposed to support at l.84 because some of them support the existence of the physical phenomenon and some of them support the link between quantum mechanics and psychology.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Remi Gau & Alice Van Helden

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: 2020_PlosOne_Maier_review_round-2.docx

PLoS One. 2020 Aug 31;15(8):e0238373. doi: 10.1371/journal.pone.0238373.r004

Author response to Decision Letter 1


24 Jun 2020

Dear Dr. Dr. Naudet,

dear Reviewers,

we would like to thank you for your additional comments on our revised manuscript entitled „A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance” submitted for publication to PLOS ONE. We tried to carefully respond to every comment you made. In this document you will find our answers to your comments and a description of how we adjusted the text (see **). We also revised some of the figures as you suggested.

Thank you for the open and fair scientific discussion we had and your profound work on this paper.

First I would like to apologize for the time it took me to do this second round of review: this should have been much faster.

**It is fine, thank you for taking this job seriously.

I would like to thank the authors for their clarifications, all the reworking and additions to the manuscript. It makes it clearer where the line between results and interpretation lie.

Updating the code and the data (from this and the previous study) on OSF have also clear added values.

My two main suggestions are:

- display for figure 3 the null distribution on which the statistical inference is performed

- add a readme to the OSF project and ensure that the code runs as intended when downloaded from source

**Ok, good point. We added a “readme” file and added a figure (Figure 4 a-c) that shows the null distributions of all three exploratory analyses and where the human data lie.

________________________________________

**Although inflation bias is not a problem as outlined above, to circumvent the multiple

testing approach in the FFT analysis described, we propose a different and more

elegant method for comparing the FFT results from the human data with the FFTs from

the 10,000 simulated data sets:

**To test the FFT results from the human data against chance occurrence, all 1,164

amplitudes obtained from the FFT of the human data set were added up creating a

sum score of the amplitudes obtained from all tested frequencies of this set. In the

same way for each of the 10,000 simulations the sum score of amplitudes was

computed. The distribution of the sum scores of amplitudes across all simulations now

served as the null distribution (see Figure 5). The sum score of amplitudes of the

human data set was Sumamp = 34.84. Only 147 of the simulations (1.47%) reached a

sum score of 34.84 or higher. We added this new analysis at the end of the

“Exploratory analysis” section and replaced the old one.

**We are not testing every BF obtained in the human data sequential analyses against the corresponding BFs from the simulations. Rather we identify the single highest one and compare it with a distribution of single highest BFs found in the simulation analyses. Thus, no multiple testing is involved here.

**But in case we missed something here, our theoretical argument against the problem

of inflation would apply here as well (see our response above).

________________________________________

I thank the authors for explaining this more thoroughly. This helped me understand why I got confused and thought there was a multiple comparison problem. The text did indeed mention what null distribution was used to make the statistical inference. However the figure of the previous version of the article showed a 95% confidence interval envelope that a) was unrelated to the inference made in the text, b) would actually suffer from multiple comparison problem if it were used as a way to threshold and detect extreme data.

**Good point, we deleted the confidence interval (see Figure 3).

The new approach suggested by the author is an improvement as it is more direct and the link between the figure and the text is much clearer.

In the same spirit I would suggest that:

the authors make the text and the figure 3 more congruent and add a figure (or maybe an inset to the figure 3) similar figure 5 showing where the maximum BF values of the human data lies when compared to the distribution of maximums obtained from the simulation.

**Done, see our response above and Figure 4a-c.

Also maybe I am still missing something but if only the maximum BF value is taken for the inference how can **other** maximum BF be found in time series as suggested by this sentence:

“Higher BFs were also found with participants numbered 500 to 600 (i.e., 400 in the early stages of the replication study).”

**Good point, the statement does not match the analysis and was therefore deleted from the text.

________________________________________

The stimuli and mask presentation time are slightly different between the pre-

registration and the manuscript. Is this because the refresh rate of the monitor was

different?

**Yes, this was the reason. We are convinced that these adjustments are not

responsible for the null effect (and nobody would argue this way).

________________________________________

Indeed I was not arguing that way. I was more trying to understand where some of the differences were coming from.

Code

But I admit that the lack of a README did not help in figuring out:

- what packages are needed

- where unzipped files should go

- in which order the scripts should be run as there seem to be some dependencies between them.

I suggest that such basic documentation should be provided.

**Done, see the “readme” document.

I was unable to make some of the R code run (which might be partly due to my lack of R knowledge) and in ‘exploratoryanalyses.R’

Error in rbind(deparse.level, ...) :

numbers of columns of arguments do not match

Could the authors make sure that everything runs as intended from simply downloading the OSF project and trying to run it from there? This is usually even better when performed by a colleague “naive” to the study to ensure a bare minimum of portability to make sure the one’s code does not just run on one’s machine.

**OK, we made sure that now everything works. We also made the independent test as suggested and it worked.

On that note, I would like to point to the authors (and the editors) this recent initiative that makes sure that submitted code runs as intended: https://codecheck.org.uk/

**Thank you. This is helpful!

I am appending below additional comments and suggestions of my physicist colleague.

Quantum theory

l.74 : « group of theories » should be replaced by « group of interpretations » (which don’t have the level of certainty of a theory)

GQT is an interpretation, even if it has the arrogance to call itself a theory, without any experiments to support its hypotheses. It doesn’t have the consensual character of a theory, it is only supported by a few physicists and only based on interpretation from the quantum model.

**We agree, this is an important point you make. However, a theory does not need any empirical evidence to be called a theory and consensus is not a feature of a theory either. But we understand what the reviewer wants to say here and we changed the corresponding statement as suggested.

OrchOR is a philosophy of mind, also based on interpretation and extrapolation from the quantum model to the mind, without any supporting experiments or direct observations. Thus, at l.76, “theory” should be replaced by “philosophy of mind”.

**OK.

In my opinion, this is not the kind of foundations that one should build assumptions in other fields upon, so I would suggest that you don’t mention quantum mechanics at all. But if you really want to mention it, it should be clear in your formulation that « if these interpretations are true », then there are similarities with your subject of experiment that would indicate that there could possibly be a causal link between the two hypothetical phenomena. In addition, it should also be very clear that these interpretations are (markedly) minor ones and still debated amongst all the others that have been put forward about the quantum model.

Otherwise, there is a high risk that your article will be automatically considered by any physicist who would read it as deliberately deforming the current consensus in physics in order to match its own conclusions, which would be regrettable if physics and psychology are as interdependent as your article would suggest, because in this case this research field could only benefit from collaboration with physicists.

** We hear you on this and added several statements in the text that put the interpretations into the context of all the other interpretations of QM.

**Just for the record: If someone really wants to put forward an innovative idea at some point one has to challenge the current consensus or parts of it, otherwise no progress will ever be made. Whether our approach was valid or not was put to an empirical test and we were found to be wrong. So, the current consensus remains intact. We make this very clear in our paper especially in the discussion section.

I understand better now what you mean by « temporal non-locality » and I was aware of the kind of phenomenon that it stands for in your article (which appears in delayed-choice experiments). The formulation is a little confusing, because « locality » etymologically refers to a « location » (in space) so maybe it would be clearer to call it « non-temporality » (this is not an official denomination).

**According to Einstein’s general relativity theory space and time are inseparable. So, locality as the term is used in general relativity (see also Einstein, Podolsky & Rosen, 1935) always includes both time and space.

I’ve found two or three articles that use the same denomination as you do but a few others use « temporal Bell inequalities » or « temporal entanglement », actually it doesn’t have an official name because it’s only an aspect of the intrication phenomenon in general.

The real experimental effect is called « delayed-choice experiments », but the choice to consider that there is a new physical phenomenon (« temporal non-locality » or else) underlying this effect depends on the interpretation. Notably in the standard interpretation it’s just an effect of observation, and in GQT it will be the same because of the role of consciousness in the observation (basically put : in delayed-choice experiments, there are unobservable events which happen before the observation that are correlated to the choice made by the observer while doing the observation later. But in GQT, only the consciousness is capable of producing a measurement, so nothing unobservable ever happens, so there is no correlation, everything is only determined when the observer arrives).

**Yes, but consciousness can determine later what has happened before and not only what occurs during the observation.

This is the only interpretation of quantum mechanics that you refer to (since OrchOR is not really an interpretation of quantum mechanics but rather an application of it to the philosophy of mind). Be careful then because the effect you refer to with « temporal non-locality » doesn’t exist in the interpretation that you refer to.

**It does exist, both in GQT and OrchOr as confirmed from the authors of GQT in personal communication and with regard to orchOr also explicitly in the paper of Hameroff (2012):

[26] Hameroff S. How quantum brain biology can rescue conscious free will. Frontiers in Integrative Neuroscience. 2012; 6(1): 93. https://doi.org/10.3389/fnint.2012.00093

I would suggest that you choose to mention only one of these ideas (non-temporality or GQT) for consistency’s sake.

**See our response above. In addition, we are only referring to what Maier et al.’s arguments were in 2014. Thus, we would like to keep the line of arguments as it is. We already shortened it to a minimum of information.

If you’re looking for an interpretation that would consider the delayed-choice experiments as a manifestation of time bidirectionality, there is the transactional interpretation, but this one doesn’t make any reference to consciousness so it cannot be linked to the rest of your work.

**Thank you for the hint. And yes, we know about TI and there are also attempts to modify it to include consciousness, but we did not want to make the theoretical part bigger (and more speculative) than absolutely needed.

I would consider the reference to non-temporality (or “temporal non-locality”) as the weakest part because if this effect is real it would typically be destroyed very quickly by decoherence (the statistical process that makes quantum effects very unprobable and unstable at large scales and usual temperatures), so it would be very unprobable that it could be maintained through a whole room full of air in order to carry retrocausal information all the way from the stimuli to the subject’s brain.

**Yes, unlikely but not impossible, that is the point. Our kind of work tried to figure out those boundary conditions under which temporal non-locality could be observed macroscopically. Nowadays, superposition states of relatively large objects are found that -given decoherence- would have been considered highly unlikely years ago.

l.84 : « implies » should be replaced by « could be explained by the hypothesis that », because it’s not a necessary implication. As I mentioned, the delayed choice experiments can be interpreted in several ways (I'm sorry for the lack of clarity of this field of physics, we're working on it). It’s also not clear what affirmation the sources 14,27,28,29 are supposed to support at l.84 because some of them support the existence of the physical phenomenon and some of them support the link between quantum mechanics and psychology.

**OK, we made the suggested change. The references address both temporal non-locality in purely physical and psycho-physical contexts and thus

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Florian Naudet

7 Aug 2020

PONE-D-20-02005R2

A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance

PLOS ONE

Dear Dr. Maier,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I would like to thank the reviewers for their help in assessing this important manuscript. Both suggested to accept it in its current form and I'm please to say that I agree, providing minor changes are made in the abstract.

I propose to use a structured abstract (i.e. more focused on the study methods and results).

Importantly, I ask you to delete any discussion about post-hoc analyses in the abstract.

I propose to add a few words in the abstract to highlight the main strengths and also limitations of the paper.

Please submit your revised manuscript by Sep 21 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Florian Naudet, M.D., M.P.H., Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear authors and editors,

As far as I can tell all my comments and concerns have been addressed.

All the supplementary material is now sufficiently documented to reproduce the results.

Figures and the analysis description now match.

The wording of the discussion has been appropriately hedged.

I have no further request to make.

I thank the authors for their patience.

Best regards

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Remi Gau

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 31;15(8):e0238373. doi: 10.1371/journal.pone.0238373.r006

Author response to Decision Letter 2


11 Aug 2020

Dear Dr. Dr. Naudet,

we would like to thank you for your additional comments on our revised manuscript entitled „A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance” submitted for publication to PLOS ONE. We tried to carefully respond to the comments you made concerning the abstract. In this document you will find our answers to your comments and a description of how we adjusted the text (see **).

Thank you and the reviewers again for the open and fair scientific discussion we had and your profound work on this paper.

Comment of the editor

I propose to use a structured abstract (i.e. more focused on the study methods and results).

** We shortened the theoretical part at the beginning and focused more exclusively on „retroactive avoidance“. We also described the design and the results in more detail.

Importantly, I ask you to delete any discussion about post-hoc analyses in the abstract.

**We deleted any discussion about the post-hoc findings.

I propose to add a few words in the abstract to highlight the main strengths and also limitations of the paper.

**We highlighted the strengths and limitations of the study at the end of the abstract.

Sincerely,

Markus Maier et al.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 3

Florian Naudet

17 Aug 2020

A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance

PONE-D-20-02005R3

Dear Dr. Maier,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Florian Naudet, M.D., M.P.H., Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Florian Naudet

20 Aug 2020

PONE-D-20-02005R3

A Preregistered Multi-Lab Replication of Maier et al. (2014, Exp. 4) Testing Retroactive Avoidance

Dear Dr. Maier:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Pr. Florian Naudet

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: 2020_PlosOne_Maier_review_round-2.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data underlying the results presented in the study are available from https://osf.io/yqqfz/files/


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES