Skip to main content
PLOS One logoLink to PLOS One
. 2022 Oct 3;17(10):e0272796. doi: 10.1371/journal.pone.0272796

Reduced choice-confidence in negative numerals

Santiago Alonso-Díaz 1,*, Gabriel I Penagos-Londoño 1
Editor: Federico Giove2
PMCID: PMC9529092  PMID: 36190954

Abstract

Negative numbers are central in math. However, they are abstract, hard to learn, and manipulated slower than positive numbers regardless of math ability. It suggests that confidence, namely the post-decision estimate of being correct, should be lower than positives. We asked participants to pick the larger single-digit numeral in a pair and collected their implicit confidence with button pressure (button pressure was validated with three empirical signatures of confidence). We also modeled their choices with a drift-diffusion decision model to compute the post-decision estimate of being correct. We found that participants had relatively low confidence with negative numerals. Given that participants compared with high accuracy the basic base-10 symbols (0–9), reduced confidence may be a general feature of manipulating abstract negative numerals as they produce more uncertainty than positive numerals per unit of time.

Introduction

Negative numbers are essential in mathematics. However, they are hard, as exemplified by the reluctance to accept them in the history of math, the trouble of teaching them, their abstract nature, and that humans process them slowly [14]. Here we focus on the hypothesis that confidence is worse for negative numerals than positive numerals. Participants picked the larger in a pair of single-digit positive, negative, or 1/positive numerals. Single-digits are important because they are the base-10 symbols and are the foundation for the more convoluted multi-digit processing [5]. To test the hypothesis, we collected implicit measures of confidence (button pressure) and estimated the amount of information produced by negative numerals with a computational model. The possibility of reduced confidence in operations requiring the basic base-10 symbols for inverted numbers (i.e., minus 0–9) is relevant as it could explain learning difficulties via math metacognition [6]. Moreover, studying confidence in single-digits, the primitives of the base-10 system, is important to further our understanding of mathematical cognition and learning of mathematics in children and adults. Before starting, we will use numeral to indicate the symbol and number or magnitude to indicate the mental quantity related to the numeral.

Previous work has focused on the nature of the mental representations of negative numerals. There are two general views on how human cognition represents negative numerals: holistically or strategically. The first one states that negative numerals are assigned a holistic magnitude [4,7,8]. It means that, for instance, when trying to pick the larger between -5 and -3, the brain does so by comparing M(-5) and M(-3). Here the function M is the transformation of the numeral or symbol to an internal mental magnitude.

The second general view, the strategic one, is that there is no representation of negative magnitudes, only of concrete positive quantities. When presented with negative numerals, humans are strategic and flip response or task demands. This hypothesis’s main implementation is the component model [1]. Negatives are decomposable into the polarity sign (-) and the positive magnitude. Suppose the task is to pick the larger negative. In that case, the mental comparison is over positive quantities, and the task demand > is changed to <. A more recent iteration of the component model states that negative numbers can be thought as multidigit numbers, with the polarity taking the role of an additional digit that can take only two values (+,-). The presence of a polarity and a digit makes the problem of comparing negative numerals a multi-attribute one [5]. In fact, a neural network implementing multi-attribute decision making can replicate empirical findings such as sign-shortcuts in mixed-comparisons (e.g. when picking the largest between -52 and 34, people just use the fact that there is a negative sign, regardless of the numerical distance between the pair of numerals) and unit-decade compatibility effect where people are faster when the larger (smaller) two-digit number in a pair has at the same time the largest decade and unit (e.g. 42 vs. 57) than when there is an incompatibility between decades and units (e.g. 37 vs. 52) [5]. However, the application of such neural network implementing multi-attribute decision making has only been applied to multi-digit comparisons and does not replicate other results such as activation of negative holistic magnitudes in some contexts [8].

Previous work has not adequately addressed the confidence question. This is important because confidence modulates efficient learning [6,9]. Here we use the following definition: confidence is the posterior probability of being correct given a choice and available evidence: p(correct|choice, evidence) [10]. The strategic and holistic hypotheses make similar predictions: negative numbers should elicit less choice confidence. The source, however, is different. The strategic hypothesis predicts that reduced confidence comes from the additional encoding. For instance, dropping the minus sign could induce this confidence: p(correct|choice, drop sign, positive magnitudes, other encoding). However, reduced confidence cannot come from negative magnitude processing because the mind does not represent negative magnitudes, only positive ones. The holistic hypothesis, on the contrary, places part of the reduction of confidence in negative magnitude processing i.e. confidencepos = p(correct|choice, positive magnitudes, other encoding) and confidenceneg = p(correct|choice, negative magnitudes, other encoding).

We will focus on a task that increases the possibility of finding holistic negative magnitudes in the participants. When people compare positive and negative numerals, randomly mixed across trials, response times and accuracy are consistent with the holistic theory [8]. Our goal is the measurement of confidence in negative numbers. One methodological challenge is that people are highly accurate with single digits which could saturate explicit confidence reports. To circumvent this possibility, we collected an implicit motor-based report: button pressure. Previous work in number cognition has found that force is affected by number representations [1115]. From this work, we know that tasks that do not ask for explicit magnitude comparisons (e.g. parity judgments tasks) induce a categorical response: larger numbers induce strong force and smaller numbers weak force; there is not a smooth gradient by numerical distance [13]. On the other hand, people can map numerical magnitude to a smooth force gradient in tasks that explicitly ask for magnitude estimates (e.g. transform a number into squeezing pressure) [14]. Thus, we argue that measuring button pressure could provide a window into confidence given that previous literature has found projections of force based on number information.

Here we will ask for a magnitude judgment in a number comparison task where participants must pick the larger number in a pair. Our main interest in this paper is to link button pressure and confidence, and check if numeral type (1/positive numerals, negative numerals, and positive numerals) modulates pressure. In the discussion we will address the consequences for mental representations of numbers and the two conflicting theories for negative numerals (holistic vs component).

Given our interest in confidence, we will confirm three theory-based properties in button pressure [10] (Fig 1): 1) a positive relationship between accuracy and button pressure, 2) higher button pressure in correct trials, and 3) higher accuracy in trials with high button pressure. Thus, if button pressure is a proxy for confidence, it should follow these three characteristics.

Fig 1. Illustration of confidence properties.

Fig 1

Correct answers are more likely with higher confidence (left panel), confidence in correct trials is higher (center panel), and, for a given level of discriminability/difficulty, trials with higher confidence should be more accurate (right panel). This figure was simulated using a uniform distribution for discriminability (U(0,1)) and a normal distribution for the perception of discriminability levels (details in [10]). This figure is not a number comparison model (for that, see Fig 5), just an illustration of the confidence properties proposed by [10].

The confidence that we study depends on choice and available evidence: p(correct|choice, evidence) [16]. What is critical of this definition is that confidence is an internal estimate that depends on the existence of an actual choice. If there is no choice, there is no confidence under this definition, only uncertainty. In our task, the choice is between a pair of numerals (pick the larger) and the available evidence are the numerals on screen, more precisely, the internal magnitude representation of those numerals. To obtain a proxy of the internal evidence produced by the numerals we simulated the decision process behind single-digit comparisons with a drift diffusion model (DDM) [17]. We will provide details in the following sections. This modeling framework allows simulating an internal decision variable from response times and the accuracy of each participant (Fig 2). More importantly, we can calculate confidence from the internal evidence i.e., from the black trace in Fig 2 (further details in methods).

Fig 2. Decision model.

Fig 2

Individuals select an option when an internal noisy decision variable (black trace) accumulates up to one of the thresholds (red dashes). Non-decision times are constant (green rectangles) and include initial encoding of the stimulus and response-related commands.

We found that button pressure follows confidence properties, differed between numeral type, and the decision dynamics obtained with the DDM model was hardly similar between numeral type. Taken together, the results indicate that single-digit negative numerals elicit less confidence. Given the high performance in our simple task (we control for accuracy and response times), the reduction in confidence seems to be a feature of dealing with negative numerals rather than math ability.

Exp. 1. single-digit numeral comparison task and decision model

Participants had to pick the larger single-digit number in a pair of positives, a pair of negatives, or a pair of 1/positives. The participant saw a random pair type on each trial but never mixed across types (e.g., never a pair containing a positive and a negative). Previous reports have found that mixing trial types induces holistic representations [8] and we are interested in confidence in negative numbers. With Exp. 1 we seek to determine if confidence accrues slower for negative numerals. We do this by using the observed accuracy and response times to model the decision variable leading to a numerosity judgment (Fig 2) and compute confidence levels at the time of choice.

Materials and methods

Participants. We recruited 50 participants for experiment 1 (27 males; mean age: 20.63 years, std.: 1.19). There was no a priori calculation of sample size. However, we will model response times and accuracy, and people are consistently worse with negative numbers than with positive numbers, in studies with sample sizes between 16 to 55 [1,2,4,8]; thus, our sample size is on the larger end. Also, we did a sensitivity analysis to check for the required effect size given an 80% power, p<0.05, a sample size of 50, and a t-test comparing two dependent means. With the G*power application we found that our sample size requires an effect size of 0.4 to achieve that power. This is a medium size effect. Differences in response times between negative and positive numerals in previous literature are approximately 80 to 100 ms (around 10% slower) [1,2,4,8]. The literature we consulted did not report effect sizes but being slower with negative numerals is a highly replicable and easy to find effect. Moreover, the effect sizes in our own data were large and with similar differences as in previous research, between 80 to 120 ms (Cohen’s d Exp 1: pos vs neg: 1.68; Exp. 2 pos vs neg: 1.46; Exp. 3 pos vs neg: 1.96; in the results sections we show the mean and standard deviations, also regressions with estimates and confidence intervals). Thus, we argue that our sample was sufficient to detect response time differences between positive and negative numerals with high power. We emphasize that this sensitivity analysis is a conservative approach given that we did not calculate an a priori sample size. We used RT for sample size sensitivity as previous research has found that RT correlates with confidence [18].

Participants of Exp. 1 did three tasks probing the processing of inverted information in two different-day sessions lasting 30 to 45 minutes each. The tasks were inverted motion perception, categorization/memorization of inverted items, and numeral comparisons. The order of the tasks was counterbalanced. In this paper, we explain and report the results of the numeral comparison task. Participants were paid approximately 5 U$D in each session (20.000 COPs).

All procedures were in accordance with the ethical standards of the institutional research committee of the economics department of Universidad Javeriana that follows international and national norms regarding research with human subjects. The research committee approved the study with approval code: FCEA-DF-0433. All participants signed a written consent form after it was explained to them.

Apparatus. Exp. 1 was run on a 13-inch laptop with a traditional QWERTY keyboard. Stimulus presentation was controlled by Psychtoolbox for Matlab.

Design. There were three trial types: a pair of positive, negative, or 1/positive numerals. The type was random on each trial Participants had to pick the larger. The fractional quantities had a simple form (1/positives) as a control. When the numerators is always the same, they are solvable through a denominator strategy [19,20]. Thus, if the comparison of negative numerals is solved by comparing positive values, these two trial types should be more similar than not (e.g., -3 vs. -5 is like 1/3 vs. 1/5, if participants pick the smaller to solve the task correctly). There was not a mixed trial type, say compare a positive to a negative.

We presented numbers from 2 to 15 across 870 trials. 150 of those trials were dummy trials. We called them dummy trials as they are not used for data analysis. We used them so that participants experienced the single digit 9 and double digits. Non-dummy trials (explained below) do not include 1 and 9 to avoid anchor strategies. With anchor strategies we mean that every time 1 or 9 (anchors) are present among single digits, the response is trivial: small (if 1) or large (if 9), without the need of estimating any numerical distance between the pair of digits on-screen. We wanted to avoid such a strategy in our participants given that single-digits do not necessarily and automatically activate magnitude representations [21,22] and strategic behavior is well-known confounder in animal and human choice behavior [23].

For non-dummy trials, namely those that we include in data analysis, we presented single digits using the numbers 2, 3, 5, 7, 8 and generated 1 exemplar pair for each logarithmic numerical distance between the pair (later we compare models with linear or log. num. distances). We did not use 4 and 6 for practical reasons i.e. fewer logarithmic distances (in log. space each pair of digits has a unique distance). Also notice that with the set 2,3,5,7,8 there are the same six possible linear numerical distances as with the set 2,3,4,5,6,7,8. Non-dummy trials did not include the numbers 1 and 9 to avoid anchor strategies (see previous paragraphs for explanation). Participants could not distinguish between dummy and non-dummy trials as they were randomly presented and were otherwise identical.

Participants experimented 10 different exemplars of pairs of digits across 10 logarithmic distances in non-dummy trials. The larger number appeared randomly on the left or right side of the screen. The distribution of non-dummy trials was as follows (Table 1, dummy trials were random, in type and distance, and were not included din any of the analysis).

Table 1.

log. distance* Num A Num B Trials Exp. 1 Trials Exp. 2
0.134 8 7 72 36
0.336 7 5 72 36
0.405 3 2 72 36
0.470 8 5 72 36
0.511 5 3 72 36
0.847 7 3 72 36
0.916 5 2 72 36
0.981 8 3 72 36
1.253 7 2 72 36
1.386 8 2 72 36

*log(numA)-log(numB).

Stimuli and procedure. On every trial, participants saw a pair of numerals on the screen (black background). Their task was to report the larger using the letter Z or M if the left or right was larger, respectively (QWERTY keyboard). The larger appeared randomly on the left or right side of the screen. If a response took more than 3 seconds the trial ended, and the participant saw on-screen a message indicating that the response was too slow. Participants saw a cue to distinguish numeral type in the trial: positive digits were blue, negative digits green, and fractions cyan. When a response was incorrect, the participant received feedback: numbers turned red.

Data analysis. We used panel linear regressions to analyze the accuracy (linear probability model) and response times up to two standard deviations from the mean of all the data (i.e., 92% of the trials). The panel regressions included effects for subject-level heterogeneity, i.e., intercept variability by subject.

Specifically, we ran random effects panel regressions to include effects of experiments’ order (fixed effects do not allow between-subject variables). We understand random effects as in econometrics [24]: subject-level baselines/intercepts that are independent of the remaining independent variables; a valid assumption because all participants went through the same experimental conditions.

We control for experiment order to check for spill-over effects. Also, we control whether the current trial was preceded by an error to account for behavioral effects caused from transitions from the error feedback.

We included interactions in the regressions when it was theoretically relevant. Namely when the dependent variable was response times (distinct slopes by numeral type suggest different mental magnitudes) and pressure (Fig 1, central panel shows distinct slopes for correct and incorrect trials).

All the regressions use logarithmic distance between the numerals because in all cases it improved the overall model, as measured by BIC (Bayesian information criterion). BIC is a measure of model fit based on the likelihood of the data and a penalty for the number of parameters. In all tables we write the BIC comparison between a model with linear numerical distances and logarithmic distances.

In all regressions, the reference category is positive numerals, hence they do not appear in the tables.

The drift-diffusion model (DDM). We modeled the decision of selecting the larger number in a pair with a drift-diffusion model (Fig 2) [25]. The drift-diffusion model successfully captures accuracy and response times in a myriad of tasks, including number-related ones. It can also describe confidence [26], namely the probability of being correct given a choice and an internal decision variable [16].

In the drift-diffusion framework, the brain produces and accumulates a decision variable at each time step (e.g., millisecond), favoring either of the available options (Fig 2; black trace). One can think of this decision variable as evolving brain activity of areas representing the relevant cognitive variables. In more abstract terms, the decision variable is all the information necessary to produce a choice.

In a numeral comparison task, the decision variable is the perceived distance between the pair of numerals [27,28]. This decision variable is noisy, so it must accumulate to a threshold before committing to an option. For example, suppose the internal decision variable hits a threshold A, representing the largest numeral. In that case, the participant selects option A. However, if it hits threshold B, they select option B, representing the incorrect smaller numeral. Thus, incorrect decisions exist because the decision variable is noisy and there is a probability of arriving at the wrong answer.

Threshold levels, the speed of accumulation (drift rate), noise, and starting point of the decision variable, are considered decision parameters as they determine the evolution of the black trace in Fig 2. The model also considers constant non-decision times (Fig 2; green squares). These are times unrelated to the formation and accumulation of the decision variable. They include initial encoding (e.g., if needed, strip minus signs) and response-related commands (e.g., if needed, flip task from < to >; or flip the left index to the right index finger to execute the motor command). The time it takes to arrive to either threshold plus the constant non-decision times is the response time.

An underlying assumption of the framework is that decision and non-decision times are independent and sequential; multi-step decision-variables are not part of the standard drift-diffusion theory [17]. Thus, our modeling exercise assumes that numerals’ mental magnitudes are unrelated to the numerals’ initial encoding and final response commands i.e. in Fig 1 black trace and green squares are independent. In other words, we assume that stimulus and response encoding (Fig 2, green square), such as detecting if there is a negative numeral on-screen, stripping the minus sign, or pressing the left or right key, are independent of the main decision loop based on noisy estimates of numerical distance (Fig 2, black trace). Independence between encoding and magnitude processing is not unusual in the literature. For instance, in the component model, polarity information (plus or minus) is separated from magnitude information [1]. In the holistic theory of [4], they propose the formula -B*M(|x|) for negative numeral magnitudes. Here x is the numeral, M is a function that transforms the numeral to a mental magnitude, and B implements compression and placing the magnitude on the negative section of the mental number line (the minus sign). Note that M does not take the flipping operation -B as an argument i.e. M is independent of non-magnitude processing.

The drift-diffusion model (Fig 2) had six parameters: 1) drift rate of evidence (DR), 2) symbolic manipulation (SYM), 3) ± Bound (BO), 4) range around zero for the inter-trial variability of the initial point of accumulation (IC), 5) non-decision times (NDT), and 6) inference uncertainty (UN). The drift rate determines how fast information is accumulated and depends on trial difficulty such that easy trials accrue information faster: DRtrial = DR*Num. distanceSym. The parameter SYM accounts for the possibility that symbols enhance mental magnitudes manipulation [29]. It can take any value between 0 (symbols cancel numerical distance effects) and 1 (raw numerical distance affects evidence drift). For numerical distance, we used logarithmic scales [30]. Inter-trial variability of the initial point of accumulation (IC) is a range around zero and non-decision times (NDT) is a constant time added to the obtained decision time on each trial.

The parameter UN determines the precision of a Bayesian inference on the mean value of the decision variable (μdv; black trace in Fig 2). Specifically, by Bayes rule:

p(μdv|dv)p(dv|μdv)p(μdv)

With a normal likelihood (p(dv | μdv)) and an uniform prior p(μdv), the resulting posterior is also normal (for derivation see, [31]):

p(μdv|dv)~Normal(dv¯,UNn) (1)

The mean (dv¯) is the mean of the decision-variable stream up to the sample n. The standard deviation is the inference uncertainty divided by the square root of the total samples n: UN/n. Large UN mean that more samples are needed to reduce uncertainty on the estimated μdv.

Given that positive values of the decision variable (dv) indicate a correct response, we estimate confidence with the cumulative probability that μdv is greater than zero at threshold. Specifically, p(μdv|dv) > 0. It is important to highlight that uncertainty UN is not the same as confidence in the model. UN is the base standard deviation of the inference, while confidence is the cumulative probability that μdv is greater than zero after a choice is made (i.e. after dv arrives at one of the thresholds).

Drift rate (DR), compression due to symbolic manipulation (SYM), Bound (BO), inter-trial variability of the initial point of accumulation (IC), and non-decision times (NDT) were fitted to each individual data using the pyDDM library for Python [32], and after excluding slow response times (> 2 std. dev from the mean of all the data; we lost approx. 8% of trials). Each numeral type was fitted separately; thus, we assume that non-decision times captures encoding differences, such as detecting whether there is a negative, positive, or 1/positive numeral on screen or dropping the minus sign. Once encoded, each numeral type generates its own decision process.

To check if drift changed between positive and negative numerals, we calculated |DRpos − DRneg|. With the resulting vector, we did a one-sample t-test against zero change (p-values corrected for multiple comparisons with Holm-Sidak). We used the absolute distances as we do not have a particular directional hypothesis (but see paired t-tests in Supplemental Information).

To implement Eq 1, we simulated multiple trials for each participant (150 per num. distance) and with the resulting decision-variable stream at the end of each trial we applied Eq 1. We manually set inference uncertainty (UN) to qualitative demonstrate confidence effects. UN does not affect the fitting procedure; it is a free parameter.

Results

Mean accuracy was almost at ceiling (Fig 3; Exp. 1 (mean,std): 1/pos: 0.95, 0.04 neg: 0.93, 0.05, pos: 0.92, 0.05). Table 1.1 reveals that accuracy was slightly higher with negative numerals and 1/n numerals. Accuracy improved with larger numerical distances. Experiment order in Exp. 1 was a significant predictor of accuracy (but still high regardless of order i.e. intercept 88%). Those that started with the random dot motion task on day 1 and then on day 2 did the numeral and memory task (i.e. order RDM_N), had better accuracy. If a trial was preceded by an error, it was more likely to be correct on that trial.

Fig 3. Accuracy (top) and response time distributions by percentiles (bottom) by trial type (Exp 1.).

Fig 3

Numerical distance in log. scale. Response times are for correct trials; see Supplemental Information for incorrect trials.

Table 1.1. Accuracy.

Exp. 1.1 Random Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.88 0.01 127.04 0 0.87 0.89
1/n 0.03 0 9.57 0 0.02 0.04
Neg. 0.01 0 3.59 0 0.01 0.02
Order M_RDM 0.01 0 3.18 0 0 0.02
Order RDM_N 0.05 0.01 8.64 0 0.04 0.06
Order N_RDM 0.02 0 4.71 0 0.01 0.02
Num. dist 0.04 0 10.84 0 0.03 0.04
RT 0.01 0.01 1.3 0.19 0 0.02
Error_next_trial 0.06 0 42.82 0 0.06 0.07
Cov. Estimator: Robust Log-likelihood 784.43 F (8,32907): 236.28
No. subj: 50 No. Obs: 32916 P-value 0
BIC: -1475 BIC vs Linear: -45

M: Memory task; RDM: Random dot motion task; N: Numeral task,

Response times behaved similarly as in previous number cognition research (Table 1.2; Exp. 1 (mean,std): 1/pos: 775 ms, 101 ms; neg: 792 ms, 94 ms; pos: 710 ms, 83 ms). Participants were slower with negative and 1/n numerals. Also, there was an effect of numerical distance. This means that as the distance between numerals increased response times got faster (Table 1.2). The RT slopes for fractions and negatives were steeper (Table 1.2, interaction terms with distance). This difference is consistent with the holistic theory. The strategic hypothesis does not predict changes in slope/processing speed because we do not represent negative magnitudes. Participants who experienced the order RDM_N were faster. If a trial was preceded by an error, that trial was slower.

Table 1.2. RT.

Exp. 1. Random Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.75 0.02 39.66 0 0.71 0.78
1/n 0.1 0.01 19.19 0 0.09 0.11
Neg. 0.1 0.01 19.51 0 0.09 0.11
Order M_RDM 0 0.03 0.02 0.98 -0.05 0.05
Order RDM_N -0.14 0.06 -2.18 0.03 -0.26 -0.01
Order N_RDM -0.01 0.03 -0.27 0.79 -0.06 0.05
Num. dist -0.04 0 -10.44 0 -0.05 -0.04
1/n:Dist -0.05 0.01 -7.66 0 -0.06 -0.03
Neg:Dist -0.03 0.01 -4.12 0 -0.04 -0.01
Error_next_trial 0.02 0 4.34 0 0.01 0.03
Cov. Estimator: Robust Log-likelihood 9347.6 F (9,32906): 223.87
No. subj: 50 No. Obs: 32916 P-value 0
BIC: -18591 BIC vs Linear: -275

M: Memory task; RDM: Random dot motion task; N: Numeral task.

In summary, the behavioral results replicate traditional outcomes in the number cognition literature: high accuracy in single-digit comparisons, distance effects in response times, and slower processing of negative numerals.

Given the observed accuracy and response times we can model the decision process. The decision model simulates cognitive evidence and we can compute a confidence metric to see if negative and positive numerals produce distinct confidence levels. We simulate the decision process with a drift diffusion model and illustrate with this framework that, given the observed response times and accuracies, positive numerals generate stronger internal evidence per unit of time. The drift diffusion model, as any modelling exercise, has assumptions (see Methods), but given the framework and assumptions it provides insights into human confidence in numerals.

Drift diffusion results

The drift-diffusion decision model captured mean accuracy and response time percentiles (Fig 3). However, the 0.9 percentile was faster in participants. Such a faster response in the 0.9 percentile could indicate other processes unrelated to the task that the model does not capture. Given that this percentile is the slowest, perhaps the response happened before ending cognitive computations.

To account for the high accuracy, we introduced a symbolic facilitation/compression parameter. We tried a simpler model without it but it failed to produce high accuracies (i.e. worse fit). Thus, in single-digit symbolic tasks, there seems to be facilitation by symbols that boost accuracy, consistent with the notion that numerals are a cognitive technology [29].

There was a change in decision dynamics across inverted (negative and 1/positive) and positive numerals (Table 1.3). This change is not concentrated in non-decision times, suggesting that it is not a simple encoding effect (e.g. drop signs). Drift, symbolic compression, bounds, and range of initial point of accumulation differed. We tried paired-sample t-tests to test directionality but it was not evident (Supplemental Information). This could mean that there is not a universal effect on how numeral type affects decision making parameters; some participants may modulate their drift rate, others compensate by reducing decision bounds; others strictly follow an encoding strategy such as dropping minus signs, and so on.

Table 1.3. Avg. parameters of individual fits and one-sample t-tests (vs zero change).
Experiment 1
Drift Sym. Compress Bound IC_range NDT
1/n
mean 3.01 0.22 1.03 0.32 0.43
std 0.88 0.11 0.65 0.21 0.09
t-test vs. neg t(49) = 8.37 t(49) = 8.81 t(49) = 2.90 t(49) = 8.21 t(49) = 4.72
t-test vs. pos t(49) = 9.75 t(49) = 10.71 t(49) = 3.04 t(49) = 7.05 t(49) = 5.31
Negative Num.
mean 2.56 0.19 0.92 0.33 0.43
std 0.71 0.1 0.2 0.18 0.07
t-test vs. pos t(49) = 8.58 t(49) = 9.92 t(49) = 7.73 t(49) = 7.57 t(49) = 9.70
Positive Num.
mean 2.46 0.09 0.77 0.22 0.41
std 0.77 0.12 0.13 0.09 0.06
Experiment 2
Drift Sym. Compress Bound IC_range NDT
1/n
mean 3 0.14 1.59 0.37 0.69
std 1.04 0.13 1.77 0.27 0.2
t-test vs. neg t(46) = 6.92 t(46) = 8.21 t(46) = 2.71 t(46) = 8.78 t(46) = 3.64
t-test vs. pos t(46) = 6.22 t(46) = 8.92 t(46) = 3.22 t(46) = 6.12 t(46) = 4.73
Negative Num.
mean 2.79 0.14 1.32 0.45 0.72
std 0.85 0.13 0.77 0.23 0.16
t-test vs. pos t(46) = 7.62 t(46) = 10.27 t(46) = 4.40 t(46) = 8.80 t(46) = 5.64
Positive Num
mean 2.65 0.09 1.2 0.3 0.67
std 0.91 0.14 0.76 0.18 0.16
Experiment 3
Drift Sym. Compress Bound IC_range NDT
1/n
mean 2.93 0.16 1.27 0.40 0.67
std 0.85 0.12 0.61 0.27 0.12
t-test vs. neg t(44) = 8.96 t(44) = 8.69 t(44) = 4.43 t(44) = 7.27 t(44) = 5.63
t-test vs. pos t(44) = 8.13 t(44) = 9.31 t(44) = 5.19 t(44) = 7.55 t(44) = 7.18
Negative Num.
mean 2.49 0.14 1.12 0.49 0.71
std 0.61 0.14 0.26 0.23 0.14
t-test vs. pos t(44) = 8.74 t(44) = 8.61 t(44) = 7.73 t(44) = 9.27 t(44) = 5.83
Positive Num.
mean 2.31 0.11 0.88 0.23 0.66
std 0.55 0.12 0.15 0.10 0.08

All p-values < 0.05, and corrected for multiple (15) comparisons (Holms-Sidak).

In the model, confidence is defined as the probability that the decision variable was positive during all the trial (i.e. the black trace in Fig 2). We calculated confidence at the end of each trial, namely when the decision variable hit one of the thresholds i.e. when a choice was made. The model has the expected characteristics (Fig 4): a) accuracy improves with higher confidence, b) correct trials have large confidence, and c) trials with larger confidence, as determined by a median cut, are more accurate.

Fig 4. Confidence properties in the drift diffusion model (Exp 1).

Fig 4

Left panels: accuracy increased with more confidence. Center panels: there was low confidence in incorrect trials. Right panels: accuracy is higher in high confidence trials. Error bars are s.e.m.

The free parameter affecting confidence in the model was the uncertainty parameter UN (Eq 1). We highlight that UN is not the same as confidence; uncertainty UN is the parameter modulating the standard deviation of the inference on the accumulated evidence (Eq 1), while confidence is the probability of being correct after committing to a choice. We tested two possibilities. First, uncertainty UN was equal for 1/positive, negative, and positive numbers. Second, uncertainty UN was lower for positive numbers. This second option represents the possibility that positive numbers generate more information per sample. In Eq 1, the standard deviation is divided by the square root of the number of samples. Therefore, if UN is lower, the standard deviation of the inference gets tighter faster and will improve confidence with fewer samples.

Fig 5 shows that if we assume equal uncertainty UN, trials with positive numbers produce less confidence (center panels, Exp. 1, 2, and 3). This happens because trials with positive numbers are generally faster (Table 1.2), producing fewer samples, and an estimate based on Eq 1 is less confident. On the other hand, if we assume that positive trials have a lower uncertainty for each sample of the decision variable, as indexed by the UN parameter, then positive numbers increase in confidence (right panels).

Fig 5. Reduced uncertainty in trials with positive numerals.

Fig 5

Exp. 1. models with equal uncertainty had an UN = 2.5 for all types of numerals. Models with different uncertainty had UNpos = 1.65, UNneg = 2.4, UNfrac = 2.4. Error bars are s.e.m.

To reiterate, this qualitative result from the model suggests two possibilities for confidence in negative numerals judgments: 1) negative numerals induce more confidence because they have more samples at choice and they have similar uncertainty UN as positive numerals (note that Fig 5 already takes into account potential difference in the other parameters of the model); or, on the other hand, 2) negative numerals could induce less confidence because their uncertainty UN is larger.

Exp. 2 confidence in single-digit comparisons

In Exp. 2 we collected an implicit measure of confidence: button pressure. We assumed that more substantial button pressure indicates confidence. Below we provide a data-based confirmation of this assumption. Importantly, the overall results of Exp. 2 suggest that negative numerals induce less confidence. This means that negative numerals produce less certainty per unit of time i.e. higher uncertainty parameter UN.

Materials and methods

Participants. We recruited 49 participants for experiment 2 (24 males; mean age: 20.78 years, std: 1.36; two participants’ data was saved incorrectly, and one subject mistakenly used the keyboard instead of the force sensors because we presented, by mistake, instructions indicating the available response keys on the keyboard; final sample for regressions evaluating pressure n = 46; final sample for drift diffusion modelling n = 47). There was no a priori calculation of sample size but in the preregistration of Exp. 2 we explicitly limited the number of participants to 50.

Participants of Exp. 2 did two tasks the same day: inverted motion perception and numeral comparisons. The order of the tasks was counterbalanced. In this paper, we explain and report the results of the numeral comparison task. Participants were paid approximately 5 U$D in each session (20.000 COPs).

All procedures were in accordance with the ethical standards of the institutional research committee of the economics department of Universidad Javeriana that follows international and national norms regarding research with human subjects. The research committee approved the study with approval code: FCEA-DF-0433. All participants signed a written consent form after it was explained to them.

Apparatus. Exp. 2 was run on a 13-inch laptop. Stimulus presentation was controlled by Psychtoolbox for Matlab. We used two-force sensors below each index finger (Force Sensitive Resistor Interlink 402; 10kΩ resistor; see diagram of circuit in Supplemental Information). The force sensitive resistor changes its resistance as a function of pressure and a microcontroller (Arduino UNO) produces values between 0 and 1023. It detects pressure as a change in resistance, not weight. Participants pressed one of the force sensors to report their decision on each trial. The microcontroller relayed information to Matlab, which presented the stimuli using Psychtoolbox. The force sensors produced a continuous pressure signal during a trial (see example trials in Supplemental Information). The Arduino’s baud rate was 9600 bits per second. We transmitted 24 characters (240 bits) and 4 floats (128 bits), for a sampling rate of 26 Hz (i.e. 9600/368). Specifically, the Arduino sent four strings of four characters, eight next lines (via Serial.println in the Arduino code), and four float numbers, to Matlab approximately every 38 milliseconds. However, empirically, we observed that Matlab collected this information around every 45 to 50 milliseconds on average, perhaps due to other processing delays (e.g. buffering, data setup, and stimulus presentation).

Stimuli, procedure, and design. The procedure and design were like Exp. 1 but with fewer trials (435 + 75 dummy trials). The main difference was that we captured button pressure for each response.

Data analysis. Experiment 2 followed a similar analysis as Exp. 1 (panel linear regressions with random effects and the drift diffusion model). It was preregistered in https://osf.io/gqtja. However, this work has evolved thanks to exposure in conferences and journals. We added the drift-diffusion model and panel regressions. Thus, the preregistration confirms that we did not change the hypothesis after obtaining the results (i.e., no HARCKing; we build the whole button pressure apparatus to test the hypothesis) but the analytic approach did change. We hope that this clarification makes the life of research reports more transparent under a preregistration model of science.

We analyzed the maximum pressure on each trial, standardized to the maximum pressure during the whole task of each participant (separately for the left and right sensor). For instance, if subject X max pressure during his whole session was 984 on the left sensor, then all left sensor pressures during the session were divided by 984 and we used for an specific trial the max value. We used this peak value in the regressions. The observed range for this dependent variable, after excluding slow response times (> 2 std. dev from the mean; we lost approx. 8% of trials), was for Exp. 2 [0.025, 1] and for Exp. 3 [0.05, 1]. We used pressure signals (see Supplemental Information) in the following interval: as soon as the numerals appeared on screen and until the subject selected an option and was no longer pressing the force sensor (i.e. force resistance < low_pressure_threshold of 100).

Results

Mean accuracy was almost at ceiling (Fig 6; Exp 2 (mean,std): 1/pos: 0.97, 0.02 neg: 0.96, 0.03 pos: 0.95, 0.04). Table 2.1 reveals that accuracy was slightly higher with negative numerals and 1/n numerals. Accuracy improved with larger numerical distances. Experiment order in Exp. 2 was not a significant predictor of accuracy (and high regardless of order i.e. intercept 90%). If a trial was preceded by an error, it was more likely to be correct on that trial.

Fig 6. Accuracy (top) and response time distributions by percentiles (bottom) by trial type (Exp 2.).

Fig 6

Numerical distance in log. scale. Response times are for correct trials; see Supplemental Information for incorrect trials.

Table 2.1. Accuracy.

Exp. 2. Random Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.9 0.01 78.64 0 0.88 0.92
1/n 0.02 0 5.81 0 0.01 0.03
Neg. 0.01 0 3.25 0 0.01 0.02
Max. press 0.05 0.01 5.36 0 0.03 0.07
Num. dist 0.02 0 5.54 0 0.01 0.03
RT 0 0.01 -0.32 0.75 -0.02 0.01
Order 0 0 -1.26 0.21 -0.01 0
Error_next_trial 0.04 0 21.49 0 0.03 0.04
Cov. Estimator: Robust Log-likelihood 3711.1 F (7,14920): 69.67
No. subj: 46 No. Obs: 14928 P-value 0
BIC: -7345 BIC vs Linear: -10

For experiment 2 we aimed to confirm the presence of confidence properties in button pressure and if numeral type modulated such pressure. We present three regressions, one for each panel of the confidence theory represented in Fig 1, while controlling for response times.

Accuracy and confidence should have a positive relation (Fig 1, left panel). Table 2.1 confirms this for Exp. 2. The pressure estimate is positive meaning that trials with higher pressure were more likely to be correct. Moreover, a simple regression, just including button pressure as a regressor, to directly test the theoretical confidence property in the first panel in Fig 1, also finds a positive relation (Exp. 2: β = 0.05, 95%CI = [0.03, 0.07], p <0.01).

For a given discriminability (i.e. numerical distance) trials with high confidence, as defined by a median split, should be more accurate (Fig 1, right panel). Table 2.2. reveals that a regressor for a dummy for the median split of max. button pressure is significant. This means that when subjects were correct, they pressed the force sensor harder, controlling for numerical distance. Moreover, a simple regression, just including the median split and numerical distance as regressors, to directly test the theoretical confidence property in the third panel in Fig 1, also finds a positive estimate for the median split (Exp. 2: β = 0.01, 95%CI = [0, 0.02], p <0.01).

Table 2.2. Accuracy.

Exp. 2. Random Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.93 0.01 92.15 0 0.91 0.95
1/n 0.02 0 5.66 0 0.01 0.03
Neg. 0.01 0 3.04 0 0 0.02
Median max. press. (dummy) 0.01 0 3 0 0 0.02
Num. dist 0.02 0 5.62 0 0.01 0.03
RT 0 0.01 0.46 0.64 -0.01 0.02
Order 0 0 -1.35 0.18 -0.01 0
Error_next_trial 0.04 0 22.02 0 0.03 0.04
Cov. Estimator: Robust Log-likelihood 3695.6 F (7,14920): 73.111
No. subj: 46 No. Obs: 14928 P-value 0
BIC: -7314 BIC vs Linear: -11

Correct trials should have higher levels of confidence than incorrect trials for a given discriminability (Fig 1, central panel). Table 2.3 shows that indeed correct trials have higher button pressure (correct estimate), in a regression that controls for numerical distance. Moreover, a simple regression, excluding numeral types and just including a dummy for correct trials and numerical distance, to directly test the theoretical confidence property in the central panel in Fig 1, further confirms this overall higher button pressure in correct trials (Exp. 2: β = 0.05, 95%CI = [0.03, 0.06], p<0.01). Fig 1, central panel, shows distinct slopes for correct and incorrect trials (positive and negative). However, the interaction term was not significant, but it was positive in line with the theory. Our participants had a low error rate and could explain the lack of interaction.

Table 2.3. Max. Press.

Exp 2. Random Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.56 0.02 22.66 0 0.52 0.61
1/n -0.01 0 -3.61 0 -0.02 -0.01
Neg. -0.02 0 -4.75 0 -0.02 -0.01
Correct 0.04 0.02 2.13 0.03 0 0.07
Num. dist -0.01 0.02 -0.62 0.54 -0.06 0.03
Dist:Correct 0.02 0.02 0.75 0.45 -0.03 0.06
RT 0.1 0.01 15.72 0 0.09 0.12
Order 0 0.02 -0.18 0.86 -0.05 0.04
Error_next_trial 0 0.01 0.15 0.88 -0.01 0.02
Cov. Estimator: Robust Log-likelihood 4946 F (8,14919): 33.992
No. subj: 46 No. Obs: 14928 P-value 0
BIC: -9806 BIC vs Linear: -2

Fig 7 has a visualization of all the confidence properties in max. button pressure that we explained in the previous paragraphs. Accuracy was higher with stronger button pressures (left panel). Button pressure was higher in correct trials (center panel). For a given difficulty, accuracy was higher in trials with higher confidence (right panel). The presence of these three properties indicates that participants expressed their confidence level with button pressure.

Fig 7. Confidence characteristics in button pressure (Exp. 2).

Fig 7

Error bars are s.e.m.

Importantly, in Exp. 2 trial type affected button pressure (Table 2.3). Participants reduced button pressure in trials with 1/positive and negative numerals. The reduction of button pressure in inverted numerals (negative and 1/n numerals) was present in the regressions shown in Table 2.3; they are not significant as simple main effects without controlling for the other variables.

Response times behaved similarly as in previous number cognition research (Table 2.4; Exp. 2 (mean,std): 1/pos: 1170 ms, 178 ms; neg: 1197 ms, 183 ms; pos: 1098 ms, 149 ms). Participants were slower in inverted trials (negative and 1/n numerals). There was an effect of numerical distance. As the distance between numerals increased response times got faster (Table 2.4). The RT slopes for fractions and negatives were steeper (Table 2.4, interaction terms with distance). This difference is consistent with the holistic theory. The strategic hypothesis does not predict changes in slope/processing speed because we do not represent negative magnitudes.

Table 2.4. RT.

Exp. 2. Random Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 1.17 0.03 38.67 0 1.11 1.23
1/n 0.1 0.01 10.03 0 0.08 0.12
Neg. 0.12 0.01 12.66 0 0.1 0.14
Num. dist -0.05 0.01 -6.79 0 -0.07 -0.04
1/n:Dist -0.03 0.01 -2.92 0 -0.06 -0.01
Neg:Dist -0.03 0.01 -2.46 0.01 -0.05 -0.01
Order -0.04 0.04 -1.04 0.3 -0.11 0.04
Error_next_trial 0.04 0.01 3.5 0 0.02 0.06
Cov. Estimator: Robust Log-likelihood 579.26 F (7,14920): 110.96
No. subj: 46 No. Obs: 14928 P-value 0
BIC: -1082 BIC vs Linear: -88

In summary, the empirical observations indicate that button pressure is a proxy for confidence, even after controlling for response times and other confounders. Importantly, participants seem to be more confident when comparing a pair of positive numerals than the other type of numerals.

In the following section we simulate the decision process with a drift diffusion model and illustrate with this framework that, given the observed response times and accuracies, positive numerals generate stronger internal evidence per unit of time. Also, the modelling exercise explains why response times correlate positively with confidence (Table 2.3, RT estimate): more information increases confidence by reducing, with each sample, the standard deviation of the probability distribution of being correct (Eq 1).

Drift diffusion results

As with Exp. 1., the drift-diffusion decision model captured mean accuracy and response time percentiles (Fig 6). Also, there was a change in decision dynamics across inverted (negative and 1/positive) and positive numerals (Table 1.3).

The model has the expected characteristics (Fig 8): a) accuracy improves with higher confidence, b) correct trials have large confidence, and c) trials with larger confidence, as determined by a median cut, are more accurate. The model is quantitatively different from the observed button pressure because we do not know (so we could not implement) a transfer function Confidence -> Button pressure. Still, the confidence output of the drift-diffusion model is insightful because, given the observed response times and accuracy, we can look at how confidence behaves under the computational model.

Fig 8. Confidence properties in the drift diffusion model (Exp 2).

Fig 8

Left panels: accuracy increased with more confidence. Center panels: there was low confidence in incorrect trials. Right panels: accuracy is higher in high confidence trials. Error bars are s.e.m.

As with Exp. 1, Fig 9 shows that if we assume equal uncertainty UN, trials with positive numbers produce less confidence (center panels). However, on the left panels, we present actual button pressure. Button pressure for positive numbers is not weaker, if anything stronger (Table 2.3). Thus, a model that assumes lower information uncertainty for positive numbers is qualitatively better at reflecting the observed effects of button pressure across numeral types. Given the observed accuracy and response times used to estimate the DDM parameters, negative numerals seem to induce a higher uncertainty UN.

Fig 9. Reduced uncertainty in trials with positive numerals.

Fig 9

Exp. 2: models with equal uncertainty had an UN = 4.5 for all types of numerals. Models with different uncertainty had UNpos = 3.5, UNneg = 4.3, UNfrac = 4.3.

Exp. 3 replication of the effects with only one experimental session, no color cues, and no feedback

In Exp. 1 and 2, participants did more tasks, related to inversion of information (e.g. report the anti-direction of moving dots). Even though we controlled for order effects, it is important to fully address this concern. In Exp. 3 participants only did the single-digit numeral comparison task. Also, Exp. 1 and 2 provided error feedback and this could affect responses. In Exp. 3 no such feedback appears. Finally, in Exp. 1 and 2 numeral types had different colors i.e. positive blue, negatives green, 1/n cyan. In Exp. 3 we drop such color cues.

Materials and methods

Participants. We recruited 50 participants for Experiment 3 (19 males; mean age: 19.44 years, std: 2.18; one participant’s data was saved incorrectly and four had an error rate larger than 15%, unusual for single digit comparisons, final n = 45; in Supplemental Information we present analyses including the four participants with large error rate and the results are similar). There was no a priori calculation of sample size, but we aimed for the same number of participants as Exp. 2. Participants were paid approximately 5 U$D in each session (20.000 COPs).

All procedures were in accordance with the ethical standards of the institutional research committee of the economics department of Universidad Javeriana that follows international and national norms regarding research with human subjects. The research committee approved the study with approval code: FCEA-DF-0433. All participants signed a written consent form after it was explained to them.

Apparatus, stimuli, procedure, and design. The apparatus, procedure, and design were like Exp. 2. The main differences were that in Exp. 3, non-dummy trials include all the single digits in the range 2 to 8, sampled randomly to form pairs. Also, digits were always blue, regardless of numeral type. Before starting, participants in Exp. 3 did 33 training trials where they received incorrect feedback (numbers turned red). Once they finished training, we turned off the red feedback in incorrect trials and they just saw blue digits for the 435 test trials regardless of performance. The objective of Exp. 3 was to eliminate any potential effect related to color cues or error feedback.

Data analysis. For Exp. 3, given that participants just did the numeral comparison task and there were no between-subjects variables such as experiments’ order effects, we report fixed effects. They are like random effects: they control for subject level heterogeneity by adding a constant in the regression for each subject. But they do not make any independence assumptions between the constant and the other independent variables (random effects do make such assumption; in Exp. 1 and 2 we had to use random effects because fixed effect do not allow between-subject variables such as order of experiments).

Results

Mean accuracy was almost at ceiling (Fig 10; Exp 3 (mean, std). 1/pos: 0.95, 0.06 neg: 0.93, 0.09 pos:0.93, 0.05). Table 3.1 reveals that accuracy was slightly higher with negative numerals and 1/n numerals. Accuracy improved with larger numerical distances.

Fig 10. Accuracy (top) and response time distributions by percentiles (bottom) by trial type (Exp 3.).

Fig 10

Numerical distance in log. scale. Response times are for correct trials; see Supplemental Information for incorrect trials.

Table 3.1. Accuracy.

Exp. 3. Fixed Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.95 0.01 80.91 0 0.93 0.98
1/n 0.03 0 8.06 0 0.02 0.04
Neg. 0.02 0 3.96 0 0.01 0.03
Max. press 0.08 0.01 7.72 0 0.06 0.1
Num. dist 0.01 0 3.49 0 0.01 0.02
RT -0.07 0.01 -7.45 0 -0.09 -0.05
Cov. Estimator: Robust Log-likelihood 3062.9 F (5,15132): 34.622
No. subj: 45 No. Obs: 15182 P-value 0
BIC: -6068 BIC vs Linear: -3

In experiment 3, we also confirm the presence of confidence properties in button pressure and that numeral type modulated such pressure. We present three regressions, one for each panel of the confidence theory represented in Fig 1, while controlling for response times.

Accuracy and confidence should have a positive relation (Fig 1, left panel). Table 3.1 confirms this for Exp. 3. The pressure estimate is positive meaning that trials with higher pressure were more likely to be correct. Moreover, a simple regression, just including button pressure as a regressor, to directly test the theoretical confidence property in the first panel in Fig 1, also finds a positive relation (Exp. 3: β = 0.07, 95%CI = [0.05, 0.09], p <0.01).

For a given discriminability (i.e. numerical distance) trials with high confidence, as defined by a median split, should be more accurate (Fig 1, right panel). Table 3.2. reveals that a regressor for a dummy for the median split of max. button pressure is significant. This means that when subjects were correct, they pressed the force sensor harder, controlling for numerical distance. Moreover, a simple regression, just including the median split and numerical distance as regressors, to directly test the theoretical confidence property in the third panel in Fig 1, also finds a positive estimate for the median split (Exp. 3: β = 0.02, 95%CI = [0.01, 0.02], p <0.01).

Table 3.2. Accuracy.

Exp. 3. Fixed Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.99 0.01 97.42 0 0.97 1.01
1/n 0.03 0 8 0 0.02 0.04
Neg. 0.02 0 3.86 0 0.01 0.03
Median max. press. (dummy) 0.02 0 6.64 0 0.02 0.03
Num. dist 0.01 0 3.55 0 0.01 0.02
RT -0.07 0.01 -7.17 0 -0.08 -0.05
Cov. Estimator: Robust Log-likelihood 3040.1 F (5,15132): 30.967
No. subj: 45 No. Obs: 15182 P-value 0
BIC: -6022 BIC vs Linear: -3

Correct trials should have higher levels of confidence than incorrect trials for a given discriminability i.e. numerical distance (Fig 1, central panel). Table 3.3 shows that indeed correct trials have higher button pressure (correct estimate), in a regression that controls for numerical distance. Moreover, a simple regression, excluding numeral types and just including a dummy for correct trials and numerical distance, to directly test the theoretical confidence property in the central panel in Fig 1, further confirms this overall higher button pressure in correct trials (Exp. 3: β = 0.07, 95%CI = [0.05, 0.09], p<0.01). Fig 1, central panel, shows distinct slopes for correct and incorrect trials. However, the interaction term was not significant, but it was positive in line with the theory. Our participants had a low error rate and could explain the lack of interaction.

Table 3.3. Max. Press.

Exp 3. Fixed Effects Estimation.

Par. Std.Err t p Low CI High CI
Intercept 0.48 0.02 24.34 0 0.44 0.52
1/n -0.01 0 -2.25 0.02 -0.02 0
Neg. -0.01 0 -3.05 0 -0.02 0
Correct 0.07 0.02 3.8 0 0.03 0.1
Num. dist 0 0.02 -0.02 0.98 -0.04 0.04
Dist:Correct 0.01 0.02 0.43 0.67 -0.03 0.05
RT 0.1 0.01 13.42 0 0.08 0.11
Cov. Estimator: Robust Log-likelihood 3520.1 F (6,15131): 38.314
No. subj: 45 No. Obs: 15182 P-value 0
BIC: -6973 BIC vs Linear: -3

Fig 11 has a visualization of all the confidence properties in max. button pressure that we explained in the previous paragraphs. Accuracy was higher with stronger button pressures (left panel). Button pressure was higher in correct trials (center panel). For a given difficulty, accuracy was higher in trials with higher confidence (right panel). The presence of these three properties indicates that participants expressed their confidence level with button pressure.

Fig 11. Confidence characteristics in button pressure (Exp. 3).

Fig 11

Error bars are s.e.m.

Response times behaved similarly as in previous number cognition research (Exp. 3 (mean,std): 1/pos: 1096 ms, 129 ms; neg: 1154 ms, 125 ms; pos: 1030 ms, 123 ms). Participants were slower in inverted trials (negative and 1/n numerals). There was an effect of numerical distance. As the distance between numerals increased response times got faster (Table 3.4). In Exp. 3, the interaction terms were not significant but had a negative sign as in Exp. 1 and 2.

Table 3.4. RT.

Exp. 3. Fixed Effects Estimation Summary.

Par. Std.Err t p Low CI High CI
Intercept 1.08 0.01 162.84 0 1.06 1.09
1/n 0.08 0.01 8.36 0 0.06 0.1
Neg. 0.13 0.01 13.59 0 0.11 0.15
Num. Dist -0.06 0.01 -8.6 0 -0.08 -0.05
1/n:Dist -0.01 0.01 -1.16 0.25 -0.03 0.01
Neg:Dist -0.01 0.01 -0.77 0.44 -0.03 0.01
Cov. Estimator: Robust Log-likelihood 1283.9 F (5,15132): 216.08
No. subj: 45 No. Obs: 15182 P-value 0
BIC: -2510 BIC vs Linear: -18

Drift diffusion results

As with Exp. 1 and 2., the drift-diffusion decision model captured mean accuracy and response time percentiles (Fig 10). Also, there was a change in decision dynamics across inverted (negative and 1/positive) and positive numerals (Table 1.3).

The model has the expected characteristics (Fig 12): a) accuracy improves with higher confidence, b) correct trials have large confidence, and c) trials with larger confidence, as determined by a median cut, are more accurate. The model is quantitatively different from the observed button pressure because we do not know (so we could not implement) a transfer function Confidence -> Button pressure.

Fig 12. Confidence properties in the drift diffusion model (Exp 3).

Fig 12

Left panels: accuracy increased with more confidence. Center panels: there was low confidence in incorrect trials. Right panels: accuracy is higher in high confidence trials. Error bars are s.e.m.

As with Exp. 1 and 2, Fig 13 shows that if we assume equal uncertainty UN, trials with positive numbers produce less confidence (center panels). However, on the left panels, we present actual button pressure. Button pressure for positive numbers is not weaker, if anything stronger (Table 3.3). Given the observed accuracy and response times used to estimate the DDM parameters, negative numerals seem to induce higher uncertainty UN.

Fig 13. Reduced uncertainty in trials with positive numerals.

Fig 13

Exp. 3 (bottom panels): models with equal uncertainty had an UN = 4.5 for all types of numerals. Models with different uncertainty had UNpos = 2.5, UNneg = 4, UNfrac = 4.5. Error bars are s.e.m.

Discussion

We measured confidence in symbolic single-digit comparisons with button pressure and a computational model. Both sources of information pointed to a reduced level of confidence in negative numerals. First, button pressure contained signatures of confidence and was weaker for negative numerals. Second, the drift diffusion model also suggested a higher uncertainty parameter for negative and 1/positive numerals. Thus, in the model, negative numerals increase confidence slower than positive numerals (Eq 1). Third, decision dynamics were different for all types of numerals. For instance, trials with positive numerals seem to have higher symbolic facilitation. Such differences affect the decision variable, which affects confidence estimates (in our case, via Eq 1). We now turn to the general implications of these results.

Reduced choice-confidence in negative numerals

We proposed button pressure as an implicit proxy for confidence. It had three theory-based characteristics of confidence and a positive correlation with response times; a correlation that was also present in a computational model that inferred the probability of being correct given the available evidence. Still, there are at least two alternatives: a) button pressure is only reflecting familiarity, or b) button pressure is only reflecting attention. That is, they are the sole drivers of button pressure so that confidence has nothing to do with our results.

The familiarity hypothesis predicts that more familiar objects, in our case positive numerals, induce faster response times [33]. Interestingly, we found a positive relation between response times and pressure. We could not find a theory on how familiarity translates to button pressure, but we think that a priori the prediction is that the more familiar object generates a stronger motor output. Thus, it seems to predict a negative relation (not a positive one): smaller response times for familiar objects should induce higher button pressure. Our model with confidence explains the positive relation: mores samples reduce uncertainty and improves confidence at choice (Eq 1).

Attention, on the other hand, could produce the observed patterns of button pressure: a) more attention/pressure, more accuracy, b) more attention/pressure in correct trials, and c) higher accuracy in trials with higher attention/pressure (Fig 1). However, these three characteristics for confidence are based on a statistical theory [10] and such an accommodation of properties to attention would need theoretical validation demonstrating that they are solely about attention, not confidence. Moreover, there are no theories that negative numerals disengage attention and if our pressure results are solely about attention, it would still be an interesting empirical finding on its own.

Instead of thinking attention or familiarity as an exclusive explanation for our results, one possibility that we favor is to see these two alternatives as mechanisms supporting confidence. For instance, evidence improves faster with attended stimuli [34] and this affects confidence (via Eq 1).

We did not collect explicit self-reports (e.g., a Likert-type scale), as traditionally seen in confidence research. Thus, it remains unclear if similar effects appear with other measurements. Future work could further validate button pressure as a proxy for explicit confidence with correlational studies with self-reports. This would be interesting but would not invalidate the current results as self-reports and behavioral outcomes are not necessarily correlated, and they are usually weakly so [35].

Even in a simple task (single-digit comparisons), there was a detectable reduction in implicit confidence. Given the high accuracy in our participants (>90%), low confidence in abstract negative numbers may characterize their manipulation. There is a lingering trace of doubt when dealing with them. A higher uncertainty parameter UN in the model is a qualitative demonstration of a slower generation of information for negative numbers. This higher uncertainty could speak to the metacognition and learning of negatives and fractions. It is a well-established result in education literature that inverted numbers are hard for children and adults alike [36]. Also, there is an specific metacognition for mathematics [6]. The suggestion that negatives elicit less information is an interesting result that could link both lines of research. A question is if this higher uncertainty has direct links to quality of education, development trajectories or if better math abilities subdue some of the uncertainty.

There was a positive relationship between confidence and response time. The conditional probability p(correct|choice, evidence) could depend on response times via the number of samples (Eq 1). This dependency on the number of samples predicts that faster response times should be less confident. Interestingly, ours is not the first report with a positive relation [18]. This divide between studies that find positive and negative relations suggest distinct mechanisms to obtain confidence from internal decision variables. Here we proposed Eq 1; but confidence could also come from other decision-variable metrics, such as the standard deviation, that could correlate positively with response times.

Negative number cognition: Holistic or strategic?

The DDM and response time results also speak to representational theories. Confidence was lower for negative numerals. Even though holistic and strategic theories predict such outcome, the underlying source is different. The holistic theory imputes the reduction of confidence to the holistic magnitudes for negative numerals. The strategic theory cannot impute changes in confidence to such holistic magnitudes for negative numerals because they do not exist: the mind only represents positive quantities.

Our results from button pressure alone cannot disentangle the hypotheses but we argue that reduced confidence cannot come only from strategic considerations. Prior theoretical work suggests that confidence reports come from decision variables [16]. The drift-diffusion model relied on the perceived numerical distance to produce a metric of confidence (Eq 1), and it replicated qualitative patterns of confidence. Also, the drift-diffusion dynamics changed across numeral types. This is also consistent with the possibility that negative numerals have distinct holistic magnitudes. For instance, the difference in drift rate for positive and negative numerals means that the decision variable in trials with positive and negative numerals was not comparable; even 1/positive trials had different decision-dynamics, consistent with the possibility of automatic activation of proportional magnitudes [37]. Still, this requires further research because the drift-diffusion model assumes independence between decision and non-decision times. If non-decision times permeate the decision variable in our task, then part of the effect could be imputable to non-decision features. But we argue that most likely both decision and non-decision times change between numeral types.

Response time results also favor holistic representations of negative numerals. The numerical distance slope of response times was different for numeral types in Exp. 1 and 2. Distinct response time slopes are an index of different sensitivity to numerical distance for negative numerals.

Other theoretical insights for number cognition

We did not find an effect of numerical distance in button pressure (Table 3.3), in line with previous work that also report weak or no continuous force effect [13]. Still, we did find effects of numeral type, suggesting that number representations do affect motor planning and output. Our work was not designed to answer the underlying relationship between force and numerical distance. We designed the study and analyses to study confidence. Other designs, for instance, parity judgment tasks where the mental magnitude activates automatically, are better at addressing force-mental magnitude relationships.

We report that logarithmic distance was better at explaining the data than linear distance, as measured by lower BIC values. At face value, this means that our participants treated, for instance, 1 vs. 2 differently than 8 vs. 9 (and similarly with other comparable distances). In terms of the possible source of this effect, our experiment cannot clearly differentiate whether this is a consequence of having an internal logarithmic mental number line or a frequency effect such that some types of distances occur more often than others in the real world (e.g., via Benford’s law). Both could also be happening as frequency effects relate to logarithm scales [38].

Methodological aspects for number cognition

Another question is the applicability of the drift-diffusion framework to comparisons of negative numerals. The framework does not apply to multi-stage decisions [17]. The question then is if negative numbers are compared in a multi-stage fashion i.e. with many decision-variables. We argue it is simpler to assume that only one decision variable applies: numerical distance. Encoding, such as detecting the type of numeral on screen or dropping signs, is independent of the main decision loop that carries the decision-variable: perceived numerical distance. We assumed that non-decision times captured the detection of which type of numeral was on screen. Once detected the participant used the appropriate magnitudes. However, it is an assumption. Thus, our modeling results represent a particular context: when seeing mixed trials, negative numbers produce less information per unit of time (i.e., a larger UN parameter). That said, the overall confidence result was also present in the data. The model was a computational tool to gain further insights under the independence assumptions mentioned above.

The drift-diffusion model has been extensively applied in number cognition [27,28,39,40]. The framework has provided conceptual clarity regarding performance by allowing to integrate response times and accuracy into single measures, such as drift rate or thresholds. By modelling at the same time response time and accuracy it is possible to obtain a better characterization of behavior in number-related tasks. A better characterization is important because many studies report correlations between simple number tasks (e.g. ordering digits from smaller to larger) with high-level math abilities; while others do not [41,42]. Such empirical conflicts could be misinterpreted as failures to replicate but in fact could be a failure to integrate both accuracy and response times [27,40].

Concerning methods, we did not find a proper transfer function to model button pressure. In our literature search we did not find a fully developed model in the literature that specifies how cognitive representations of confidence interact with motor output leading to pressing a button. Step functions seem insufficient because button pressure reflects a continuous confidence signal (e.g. stronger for positive numbers), not easily explained by all-or-nothing button press.

The cognitive activation of negative magnitudes is context-dependent [8]. Therefore, in tasks where negative numerals comparisons are solved strategically (e.g., component model), confidence may not be affected. Our mixed design and the fact that participants did other inversion-related tasks the same day could have helped activate holistic negative magnitudes. Though, we did not find spill-over effects in the regressions controlling for experiment order and Exp. 3, where participants only did one task, had similar outcomes. In sum, we provided evidence that negative magnitudes can carry less cognitive information than positive ones, even in a simple arithmetic task solved by educated adults. An intriguing question is if such a reduction of confidence translates to more complex tasks with negative quantities, and in which contexts (academic, economic, social). Negative numbers may produce more uncertainty, and this should impact learning, scientific reasoning, and decision-making.

Supporting information

S1 File

(DOCX)

Data Availability

Data is available in the Open Science Framework (DOI https://osf.io/yuvz5/).

Funding Statement

S.A. received an early career grant from the university (Pontificia Universidad Javeriana. ID PPTA 8329; https://www.javeriana.edu.co/vicerrectoria-de-investigacion/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Ganor-Stern D, Tzelgov J. Negative Numbers Are Generated in the Mind. Experimental Psychology 2008; 55: 157–163. doi: 10.1027/1618-3169.55.3.157 [DOI] [PubMed] [Google Scholar]
  • 2.Krajcsi A, Igacs J. Processing negative numbers by transforming negatives to positive range and by sign shortcut. European Journal of Cognitive Psychology 2010; 22: 1021–1038. [Google Scholar]
  • 3.Stephan M, Akyuz D. A Proposed Instructional Theory for Integer Addition and Subtraction. Journal for Research in Mathematics Education 2012; 43: 428–464. [Google Scholar]
  • 4.Varma S, Schwartz DL. The mental representation of integers: An abstract-to-concrete shift in the understanding of mathematical concepts. Cognition 2011; 121: 363–385. doi: 10.1016/j.cognition.2011.08.005 [DOI] [PubMed] [Google Scholar]
  • 5.Huber S, Nuerk H, Willmes K, et al. A General Model Framework for Multisymbol Number Comparison. Psychological Review 2016; 123: 667–695. doi: 10.1037/rev0000040 [DOI] [PubMed] [Google Scholar]
  • 6.Vo VA, Li R, Kornell N, et al. Young Children Bet on Their Numerical Skills Metacognition in the Numerical Domain. Psychol Sci 2014; 25: 1712–1721. doi: 10.1177/0956797614538458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fischer MH. Cognitive representation of negative numbers. Psychological Science 2003; 14: 278–282. doi: 10.1111/1467-9280.03435 [DOI] [PubMed] [Google Scholar]
  • 8.Shaki S, Petrusic WM. On the mental representation of negative numbers: Context-dependent SNARC effects with comparative judgments. Psychonomic Bulletin & Review 2005; 12: 931–937. doi: 10.3758/bf03196788 [DOI] [PubMed] [Google Scholar]
  • 9.Drugowitsch J, Mendonça AG, Mainen ZF, et al. Learning optimal decisions with confidence. Proc Natl Acad Sci U S A 2019; 116: 24872–24880. doi: 10.1073/pnas.1906787116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sanders JI, Hangya B, Kepecs A. Signatures of a Statistical Computation in the Human Sense of Confidence. Neuron 2016; 499–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fischer R, Miller J. Does the semantic activation of quantity representations influence motor parameters? Experimental Brain Research 2008; 189: 379–391. doi: 10.1007/s00221-008-1434-5 [DOI] [PubMed] [Google Scholar]
  • 12.Krause F, Lindemann O, Toni I, et al. Different Brains Process Numbers Differently: Structural Bases of Individual Differences in Spatial and Nonspatial Number Representations. J Cogn Neurosci 2014; 26: 768–776. doi: 10.1162/jocn_a_00518 [DOI] [PubMed] [Google Scholar]
  • 13.Vierck E, Kiesel A. Congruency Effects Between Number Magnitude and Response Force. Journal of Experimental Psychology: Learning Memory and Cognition 2010; 36: 204–209. doi: 10.1037/a0018105 [DOI] [PubMed] [Google Scholar]
  • 14.Núñez R, Doan D, Nikoulina A. Squeezing, striking, and vocalizing: Is number representation fundamentally spatial? Cognition 2011; 120: 225–35. doi: 10.1016/j.cognition.2011.05.001 [DOI] [PubMed] [Google Scholar]
  • 15.Miklashevsky A, Lindemann O, Fischer MH. The Force of Numbers: Investigating Manual Signatures of Embodied Number Processing. Frontiers in Human Neuroscience 2021; 14: 1–16. doi: 10.3389/fnhum.2020.590508 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Pouget A, Drugowitsch J, Kepecs A. Confidence and certainty: distinct probabilistic quantities for different goals. Nature Neuroscience 2016; 19: 366–374. doi: 10.1038/nn.4240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ratcliff R, McKoon G. The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput 2008; 20: 873–922. doi: 10.1162/neco.2008.12-06-420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rahnev D, Desender K, Lee ALF, et al. The Confidence Database. Nature Human Behaviour 2020; 4: 317–325. doi: 10.1038/s41562-019-0813-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Reyna VF, Brainerd CJ. Numeracy, ratio bias, and denominator neglect in judgments of risk and probability. Learning and Individual Differences 2008; 18: 89–107. [Google Scholar]
  • 20.Schneider M, Siegler RS. Representations of the magnitudes of fractions. Journal of Experimental Psychology: Human Perception and Performance 2010; 36: 1227. doi: 10.1037/a0018170 [DOI] [PubMed] [Google Scholar]
  • 21.Cohen DJ. Integers do not automatically activate their quantity representation. Psychonomic Bulletin & Review 2009; 16: 332–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wong B, Szücs D. Single-digit Arabic numbers do not automatically activate magnitude representations in adults or in children: Evidence from the symbolic same-different task. Acta Psychologica 2013; 144: 488–498. doi: 10.1016/j.actpsy.2013.08.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ashwood ZC, Roy NA, Stone IR, et al. Mice alternate between discrete strategies during perceptual decision-making. Nature Neuroscience 2022; 25: 201–212. doi: 10.1038/s41593-021-01007-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wooldridge JM. Introductory econometrics: A modern approach. 7th ed. Boston, MA: Cengage learning, 2020. [Google Scholar]
  • 25.Ratcliff R, Smith PL, Brown SD, et al. Diffusion Decision Model: Current Issues and History. Trends in Cognitive Sciences 2016; 20: 260–281. doi: 10.1016/j.tics.2016.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.van den Berg R, Anandalingam K, Zylberberg A, et al. A common mechanism underlies changes of mind about decisions and confidence. Elife 2016; 5: 1–21. doi: 10.7554/eLife.12192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Park J, Starns JJ. The Approximate Number System Acuity Redefined: A Diffusion Model Approach. Frontiers in Psychology 2015; 6: 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ratcliff R. Measuring psychometric functions with the diffusion model. Journal of Experimental Psychology Human Perception and Performance 2014; 40: 870–888. doi: 10.1037/a0034954 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Frank MC, Everett DL, Fedorenko E, et al. Number as a cognitive technology: Evidence from Pirahã language and cognition. Cognition 2008; 108: 819–824. doi: 10.1016/j.cognition.2008.04.007 [DOI] [PubMed] [Google Scholar]
  • 30.Dehaene S, Izard V, Spelke E, et al. Log or linear? Distinct intuitions of the number scale in Western and Amazonian indigene cultures. Science (1979) 2008; 320: 1217–1220. doi: 10.1126/science.1156540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gelman A, Carlin JB, Stern HS, et al. Bayesian Data Analysis. 2nd Editio. Boca Raton, Florida: Chapman & Hall/CRC, 2004. [Google Scholar]
  • 32.Shinn M, Lam NH, Murray JD. A flexible framework for simulating and fitting generalized drift-diffusion models. BioRxiv 2020; 1–26. doi: 10.7554/eLife.56938 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Smith EE. Effects of familiarity on stimulus recognition and categorization. Journal of Experimental Psychology 1967; 74: 324–332. doi: 10.1037/h0021274 [DOI] [PubMed] [Google Scholar]
  • 34.Krajbich I, Armel C, Rangel A. Visual fixations and the computation and comparison of value in simple choice. Nat Neurosci 2010; 13: 1292–8. doi: 10.1038/nn.2635 [DOI] [PubMed] [Google Scholar]
  • 35.Dang J, King KM, Inzlicht M. Why Are Self-Report and Behavioral Measures Weakly Correlated? Trends in Cognitive Sciences 2020; 24: 267–269. doi: 10.1016/j.tics.2020.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Siegler RS, Fazio LK, Bailey DH, et al. Fractions: The new frontier for theories of numerical development. Trends in Cognitive Sciences 2013; 17: 13–19. doi: 10.1016/j.tics.2012.11.004 [DOI] [PubMed] [Google Scholar]
  • 37.Jacob SN, Vallentin D, Nieder A. Relating magnitudes: the brain’s code for proportions. Trends Cogn Sci 2012; 16: 157–66. doi: 10.1016/j.tics.2012.02.002 [DOI] [PubMed] [Google Scholar]
  • 38.Piantadosi ST. A rational analysis of the approximate number system. Psychonomic Bulletin & Review 2016; 23: 877–886. doi: 10.3758/s13423-015-0963-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Dehaene S. Origins of mathematical intuitions: The case of arithmetic. Ann N Y Acad Sci 2009; 1156: 232–59. doi: 10.1111/j.1749-6632.2009.04469.x [DOI] [PubMed] [Google Scholar]
  • 40.Ratcliff R, Thompson C, Mckoon G. Modeling individual differences in response time and accuracy in numeracy. Cognition 2015; 137: 115–136. doi: 10.1016/j.cognition.2014.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Price GR, Palmer D, Battista C, et al. Nonsymbolic numerical magnitude comparison: reliability and validity of different task variants and outcome measures, and their relationship to arithmetic achievement in adults. Acta Psychologica 2012; 140: 50–57. doi: 10.1016/j.actpsy.2012.02.008 [DOI] [PubMed] [Google Scholar]
  • 42.Lyons IM, Beilock SL. Numerical ordering ability mediates the relation between number-sense and arithmetic competence. Cognition 2011; 121: 256–61. doi: 10.1016/j.cognition.2011.07.009 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Federico Giove

31 Jan 2022

PONE-D-21-25521Reduced choice-confidence in negative numeralsPLOS ONE

Dear Dr. Alonso-Diaz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I received two reviews about your manuscript. Especially reviewer #1 raised some serious methodological concerns that need to be addressed. I encourage you to amend the manuscript according to the suggestions, taking special care of the points raised by reviewer #1. Note that this can imply additional experimental work, according to how the comment about a potential design flaw is addressed. When sublitting your revison, please double check figures and upload high quality artwork, current figures are barely readable. 

Please submit your revised manuscript by Mar 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Federico Giove, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

3. Thank you for stating the following in the Funding Section of your manuscript:

“The study received funding from the university.”We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“S.A. received an early career grant from the university (Pontificia Universidad Javeriana. ID PPTA 8329; https://www.javeriana.edu.co/vicerrectoria-de-investigacion/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Summary

The authors report statistical modeling of two data sets in which adults performed magnitude comparisons with positive, negative, and fraction numerals. Their main point is that the second data set revealed weaker maximum response force for negative compared to the other number types.

Evaluation

While the main message is newsworthy, the ms fails to integrate this finding adequately into the current literature and consequently lacks important methodological and analytical detail. Furthermore, a potential flaw in the design might account for the result and needs to be addressed. A substantial revision is necessary before this ms could perhaps make a useful contribution to the field.

Major Problems

1) LITERATURE: The authors seem unaware of the distinction between kinematics and kinetics of movement. While they cite a range of kinematic studies (mouse tracking, pointing), they completely overlooked the cognitive literature on force production, beginning with Abrams & Balota (1991) and extending to the recent modeling work of Miklashevsky et al. (2021) in the numerical domain. This omission results in rather superficial reporting for the “novel” dependent measure, both in terms of data collection and data analysis (see below). Another example is the authors’ referring to Dotan’s work for log compression (p. 21) but the same authors have since revoked this account (Dotan & Dehaene, 2016).

2) METHODS AND DESIGN: The information contained on p. 6-8 is incomplete and needs to be massively expanded and systematized (separate sections for participants, apparatus, stimuli, design, procedure) in order to allow proper appreciation and replication. A key point to elaborate is the recording and subsequent analysis of force data (see Miklashevsky et al., 2021 and references therein for the complexity of this topic) to help readers understand the choice and extraction of the specific force measure used. The relationship between number magnitude and force should also be reported to relate this work to the current debate. I list here several other specific omissions:

a) There is contradictory information about the range of numbers used (either 2-15 or 2,3,5,7,8; and why not 4 and 6?) and the specific items and their frequency that resulted in the reported number of trials.

b) What is a “dummy trial”?

c) What is meant by “anchor strategies” (illustrative example and references needed)?

d) The sample size is not justified, either a priori or retrospectively.

e) There seems to be no specific ethics approval for this study (as indicated by a reference number), merely a general statement that authors complied with ethical regulations.

f) The data collection was embedded into a series of related tasks, apparently intended to prime “inversion” (motion perception, categorization) that is not sufficiently well reported to permit understanding of possible spill-over effects; ideally, absence of spill-over should be formally reported in terms of non-significant order effects.

3) RESULTS

a) While there is extensive statistical modeling, some basics remain opaque because of lack of descriptives, such as reporting of average RT or accuracy in the text. One example is the differential distance effect in accuracy (p. 11) and speed; also, the authors confused “two samples” with “two-sided” testing (p. 10, bottom).

b) The Figures in the ms are of poor quality, making even the identification of axes labels impossible. This is unprofessional. All I was able to notice is that the authors erroneously used the unit “percent” for a probability scale.

4) DISCUSSION:

a) A potential flaw in the design might account for the result and needs to be addressed. Specifically, the decision to feed back errors through color changes (p.7) established specific color transition probabilities that would be effective at the start of each new trial. This may well be the very mechanisms by which “incorrect trials priors somehow moved” (p. 18) that the authors speculate about. Should this not be ruled out with a control experiment?

b) The authors never fulfill their promise (from p. 10) to discuss implications of the assumed seriality of processes (I found line 6 on p. 21 as the only return to this important issue)

Minor Issues: These largely relate to the writing style; in light of their sheer number it is worth highlighting them:

- The authors frequently state incomplete comparisons. Already the third sentence needs to be completed “…worse for negative numerals than for …”; there are numerous such instances throughout the ms.

- The ms contains a large number of grammatically problematic formulations. A pertinent example that misleads readers is on page 8, line 4 from the bottom, where the authors stated ”It includes...” but should have stated “They include…” because it is NOT the decision variable that includes encoding and responding-related times.

- The writing is often opaque because of colloquial style (e.g., first new paragraph on p. 19). But already the very first sentence “Negatives are essential in mathematics” requires elaboration to rule out photographic negatives.

- There are some illogical statements. For example, the sentences 5 and 6 of the Intro are contradictory because lack of variability prevents strong correlations. On page 9 various cognitive processes are ascribed to the minus B parameter.

- Further intransparency results from the use of different labels for the same concept (1/positive numerals, inverted numbers, inverted problems, later 1/n fractions) and lack of definition of acronyms such as BIC (p.9).

- The ms should be checked for typos (e.g., p. 9: “striping” should be “stripping”, p. 18: “between response times” should be “with response times”)

References

Abrams, R. A., & Balota, D. A. (1991). Mental chronometry: Beyond reaction time. Psychological Science, 2, 153-157.

Dotan, D., & Dehaene, S. (2016). On the origins of logarithmic number-to-position mapping. Psychological Review, 123(6), 637–666. doi:10.1037/rev0000038

Miklashevsky, A., Lindemann, O. & Fischer, M.H. (2021). The Force of Numbers: Investigating Manual Signatures of Embodied Number Processing. Front. Hum. Neurosci. 14:590508. doi: 10.3389/fnhum.2020.590508

Reviewer #2: This paper presents experiments and models on the confidence of decision (post-decision estimate of correct) in the case of negative numbers.

It is an important issue, as many evidences both from human children and animals suggest that their coding is different from small positive numerosities and numbers.

I find particularly interesting that authors complement the study with human participants and the model.

About this, I suggest to add a wider reference on models proposed in numerical cognition. More in general, as it is a very wide field, adding more references can be useful.

Results confirm authors' hypothesis and are interesting for a wide audience; I suggest to stress the contribution of the model in the discussion.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 3;17(10):e0272796. doi: 10.1371/journal.pone.0272796.r002

Author response to Decision Letter 0


3 Apr 2022

Reviewer #1: Summary

We want to start by thanking you for your detailed review, is something we truly appreciate, and hope to fully address your main concerns below and in the new manuscript.

The authors report statistical modeling of two data sets in which adults performed magnitude comparisons with positive, negative, and fraction numerals. Their main point is that the second data set revealed weaker maximum response force for negative compared to the other number types.

Evaluation

While the main message is newsworthy, the ms fails to integrate this finding adequately into the current literature and consequently lacks important methodological and analytical detail. Furthermore, a potential flaw in the design might account for the result and needs to be addressed. A substantial revision is necessary before this ms could perhaps make a useful contribution to the field.

Major Problems

1) LITERATURE: The authors seem unaware of the distinction between kinematics and kinetics of movement. While they cite a range of kinematic studies (mouse tracking, pointing), they completely overlooked the cognitive literature on force production, beginning with Abrams & Balota (1991) and extending to the recent modeling work of Miklashevsky et al. (2021) in the numerical domain. This omission results in rather superficial reporting for the “novel” dependent measure, both in terms of data collection and data analysis (see below). Another example is the authors’ referring to Dotan’s work for log compression (p. 21) but the same authors have since revoked this account (Dotan & Dehaene, 2016).

• Indeed, we had a bias towards kinematic studies, which we are more familiar with; this is our first force paper. We are grateful for these new references relating force and number cognition. In the introduction we tried to summarize the overall findings of this literature. page 4

• We also drop any mention that our paper uses a “novel” dependent measure. We are now aware that it is not the case.

• Regarding the Dotan paper of 2013, we no longer use it in the new manuscript.

2) METHODS AND DESIGN: The information contained on p. 6-8 is incomplete and needs to be massively expanded and systematized (separate sections for participants, apparatus, stimuli, design, procedure) in order to allow proper appreciation and replication. A key point to elaborate is the recording and subsequent analysis of force data (see Miklashevsky et al., 2021 and references therein for the complexity of this topic) to help readers understand the choice and extraction of the specific force measure used.

• We extend the explanation of the apparatus and measure (page 7) and provide a diagram of the Arduino setup in supplemental information (Supp. Fig. 13).

The relationship between number magnitude and force should also be reported to relate this work to the current debate.

• Tables 3 report this relationship. It was not significant. We did find lower button pressure for negative numerals than for positive numerals, suggesting that the nature of the number representations do affect motor planning. We discuss this in page 27.

I list here several other specific omissions:

a) There is contradictory information about the range of numbers used (either 2-15 or 2,3,5,7,8; and why not 4 and 6?) and the specific items and their frequency that resulted in the reported number of trials.

• We put a table with the actual numbers and frequencies (page 9).

• We improved the explanation on why the set 2,3,5,7, and 8, and the explanation on why we distinguished between dummy (2-15) and non-dummy trials (2,3,5,7,and, 8) (pages 7-9)

• We ran a new experiment with the full set 2,3,4,5,6,7,8.

b) What is a “dummy trial”?

• A trial that exposes participants to a wider range of numbers but, and this is important, we do not include in data analysis. We clarify in the new manuscript why these trials are important to avoid anchor strategies (which we also explain better in the new manuscript and see our response to c) below). Pages 7-9

c) What is meant by “anchor strategies” (illustrative example and references needed)?

• In the set of numbers 1 to 9, 1 and 9 are the extreme anchors. Thus, if participants experience that set of numbers, every time they see a 1 or 9 in the pair they don’t have to compute any distance, they just pick the other number (if 1) or 9.

• Thus, dummy trials include 9 (the range of dummy trials was 2 to 15) but we do not analyze them (also because in dummy trials there are two-digit numbers).

• Including dummy trials was a conservative approach to avoid strategic behavior so that participants mostly based their judgments on numerical distance.

d) The sample size is not justified, either a priori or retrospectively.

• We now further clarify in the methods a retrospective reason (we preregistered 50), and give a rationale why we do not calculate post-hoc power. Page 6

• Sample size for the new experiment was capped at 50.

e) There seems to be no specific ethics approval for this study (as indicated by a reference number), merely a general statement that authors complied with ethical regulations.

• Sorry if this was not clear. We now further emphasize that the study was approved by an actual committee and the reference number. Page 7

f) The data collection was embedded into a series of related tasks, apparently intended to prime “inversion” (motion perception, categorization) that is not sufficiently well reported to permit understanding of possible spill-over effects; ideally, absence of spill-over should be formally reported in terms of non-significant order effects.

• We change all the regressions to random effects panel regressions to include the between-subjects factor of order (page 9 and 10). Fixed-effects regressions by definition do not accept between-subject variables. The overall findings that numeral type affected button pressure holds, even after controlling for order. Moreover, the presence of theoretically relevant effects of confidence in button pressure do not disappear after controlling for order (Tables 2,3,4,5)

• We ran a new experiment where participants only did the numeral comparison task and the results were similar.

3) RESULTS

a) While there is extensive statistical modeling, some basics remain opaque because of lack of descriptives, such as reporting of average RT or accuracy in the text. One example is the differential distance effect in accuracy (p. 11) and speed;

• We now include average and standard deviations for accuracy (page 14) and response time (page 18), by numeral type.

• Also, Figure 4 has the average accuracy and response times by percentiles, both for participants and the DDM model.

also, the authors confused “two samples” with “two-sided” testing (p. 10, bottom).

• We did mean two-sample t-test (i.e. a paired independent-samples t test). We clarify this in the new manuscript (Table 5, pages 20 and 21).

b) The Figures in the ms are of poor quality, making even the identification of axes labels impossible. This is unprofessional. All I was able to notice is that the authors erroneously used the unit “percent” for a probability scale.

• We are sorry for this. We submitted high quality figures however it seems that the ones rendered by the submission system on the manuscript were low quality ones.

• The high-quality ones had to be accessed via the link in the upper left corner of the page containing each figure.

• In case the rendering system fails, we now uploaded the manuscript with figures in place to psyarxiv: https://psyarxiv.com/sdfb9

4) DISCUSSION:

a) A potential flaw in the design might account for the result and needs to be addressed. Specifically, the decision to feed back errors through color changes (p.7) established specific color transition probabilities that would be effective at the start of each new trial. This may well be the very mechanisms by which “incorrect trials priors somehow moved” (p. 18) that the authors speculate about. Should this not be ruled out with a control experiment?

• We now include in the regressions a dummy variable controlling if the current trial was preceded by an incorrect response. The overall results were the same, which discards a simple color transition effect (Tables 2,3,4).

• Moreover, there are really few incorrect trials (approx. 5%), so such transitions were experienced rarely in most participants.

• Finally, we also ran a new experiment without feedback or any color changes and we obtained similar results

b) The authors never fulfill their promise (from p. 10) to discuss implications of the assumed seriality of processes (I found line 6 on p. 21 as the only return to this important issue)

• We agree that this is an important issue but we do not have the means to confirm the seriality. We apologize if in the original manuscript there was any indication that our paper was about seriality. For this revision, we drop any promise and discussion, and just state in the introduction that it is an important assumption of the drift diffusion framework. We think is important to make this assumption transparent for future researchers using the DDM model.

Minor Issues: These largely relate to the writing style; in light of their sheer number it is worth highlighting them:

- The authors frequently state incomplete comparisons. Already the third sentence needs to be completed “…worse for negative numerals than for …”; there are numerous such instances throughout the ms.

• Hopefully we completed all comparisons (we apologize if we still have any in the new manuscript)

- The ms contains a large number of grammatically problematic formulations. A pertinent example that misleads readers is on page 8, line 4 from the bottom, where the authors stated ”It includes...” but should have stated “They include…” because it is NOT the decision variable that includes encoding and responding-related times.

• We fixed this and (hopefully) similar ones.

- The writing is often opaque because of colloquial style (e.g., first new paragraph on p. 19). But already the very first sentence “Negatives are essential in mathematics” requires elaboration to rule out photographic negatives.

• Sorry for this. We did not try to be opaque and agree that science should be clear. We had colleagues read the original manuscript but it was not enough. For this new submission we tried to revise any opaque writing. Also, even though we know is just but an example you brought, we changed the expression “negatives” with the more specific “negative numerals” or “negative numbers”.

- There are some illogical statements. For example, the sentences 5 and 6 of the Intro are contradictory because lack of variability prevents strong correlations.

• We erased those sentences. We wanted to say that high accuracy relates to high confidence, and this is a usual finding. However, we decided to drop that idea as it is not critical for the argument.

On page 9 various cognitive processes are ascribed to the minus B parameter.

• We are not sure if we understand your point here. B is compression and minus is the rotation to the negative portion. We now further clarify this in the new manuscript.

- Further intransparency results from the use of different labels for the same concept (1/positive numerals, inverted numbers, inverted problems, later 1/n fractions) and lack of definition of acronyms such as BIC (p.9).

• We changed all 1/n (or similar) to 1/positive numerals.

• We now define the acronym BIC.

- The ms should be checked for typos (e.g., p. 9: “striping” should be “stripping”, p. 18: “between response times” should be “with response times”)

• Done (hopefully).

References

Abrams, R. A., & Balota, D. A. (1991). Mental chronometry: Beyond reaction time. Psychological Science, 2, 153-157.

Dotan, D., & Dehaene, S. (2016). On the origins of logarithmic number-to-position mapping. Psychological Review, 123(6), 637–666. doi:10.1037/rev0000038

Miklashevsky, A., Lindemann, O. & Fischer, M.H. (2021). The Force of Numbers: Investigating Manual Signatures of Embodied Number Processing. Front. Hum. Neurosci. 14:590508. doi: 10.3389/fnhum.2020.590508

Reviewer #2: This paper presents experiments and models on the confidence of decision (post-decision estimate of correct) in the case of negative numbers.

It is an important issue, as many evidences both from human children and animals suggest that their coding is different from small positive numerosities and numbers.

I find particularly interesting that authors complement the study with human participants and the model.

About this, I suggest to add a wider reference on models proposed in numerical cognition. More in general, as it is a very wide field, adding more references can be useful.

Results confirm authors' hypothesis and are interesting for a wide audience; I suggest to stress the contribution of the model in the discussion.

• Thanks for your comments and noting the significance of the paper. We appreciate it.

• We now stress the contribution of the model. In particular, in regard to confidence Pages 27-28.

• We do not compare our model to other non-ddm models (e.g. Huber, et al, 2016 connectionist model) because it will be too hard and speculative.

________________________________________

Attachment

Submitted filename: Reviewers_1_PONE.docx

Decision Letter 1

Federico Giove

22 Apr 2022

PONE-D-21-25521R1Reduced choice-confidence in negative numeralsPLOS ONE

Dear Dr. Alonso-Diaz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. There are still many comments of reviewer #1 that should be addressed, in particular all those that are related to methodological issues or unclear procedures. Considering the split reviewers opinions, I'm involving a third reviewer. Note that the third reviewer will receive the current or the revised version of your paper according to the time needed for revision submission. In any case, please add to your next revision a detailed response to each point raised by reviewer #1.

Please submit your revised manuscript by Jun 06 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Federico Giove, PhD

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I have read the cover letter and the revised ms plus the supplementary document and my impression is mixed.

1. The authors made several adjustments and additions that strengthen the ms, including explanations of their idiosyncratic terminology (“dummy trials” for filler trials) and references to relevant papers that were previously omitted. However, repeatedly directing readers to Ganor-Stern & Tzelgov (2008) for the component model is still suboptimal because the paper by Huber (their Ref 5) constitutes its more recent development. Similarly, the introduction of “button pressure” as the measure of interest (on page 4) is immediately followed by references to “motor metrics”, i.e. kinematic studies, even though the “button pressure” is an isometric assessment without any kinematics involved. In my idiosyncratic view these are suboptimal revisions.

2. More importantly, the methods descriptions are still not detailed enough to permit replication. Although stimulus ranges are now clear and the recording apparatus for “button pressure” is now explained in more detail, the sensitivity of this device is still unclear. If the range is 0.25 to 0.90, does this mean that 65 levels of pressure were discriminable while the actual force produced (in Pascal or Newton) remains unknown? Another open issue, of fundamental importance for the later discussion about early attentional vs later conceptual effects: What was the sampling frequency (i.e. the temporal resolution of pressure measurements)?

3. Moreover, despite my extensive queries on this point, I still find only a SINGLE sentence that describes the force data analysis, namely “Data analysis. We used panel linear regressions to analyze the accuracy (linear probability model), response times up to two standard deviations from the mean (i.e., 95% of the trials), and button pressure”. This leaves open fundamental questions such as the time during which force was recorded or integrated, the data filtering or trimming for this inherently noisy signal, or the computation of parameters for analysis, such as average or peak force per trial, or many other candidates. None of this is explained.

4. I am unhappy with various minor aspects of this revision: The authors’ claim to defend their sample size of 50 participants “in studies with similar sample sizes (1,2,4,8)” (p. 6) is misleading readers because only Experiment 2 of ref. 4 has 55 participants, while all others have between 16 and 27. My request to determine power or sensitivity retrospectively was not addressed. Some wording is still poor: On page 7 “Exp. 1 was run in a 13-inch laptop” (should be “on”), or “Uncertainty is a more general concept as it is not a conditioned in choice.” (p. 5) is ungrammatical. Several references contain the superfluous word “internet”.

Reviewer #2: The authors have addressed most of the raised points. In my previous revision I focused on the model, as it is interesting in my opinion and authors have included a wider reflection on this issue

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 3;17(10):e0272796. doi: 10.1371/journal.pone.0272796.r004

Author response to Decision Letter 1


24 May 2022

PLEASE SEE OUR RESPONSES ON THE WORD FILE AS THERE ARE SOME FIGURES

Reviewer #1:

I have read the cover letter and the revised ms plus the supplementary document and my impression is mixed.

1. The authors made several adjustments and additions that strengthen the ms, including explanations of their idiosyncratic terminology (“dummy trials” for filler trials) and references to relevant papers that were previously omitted. However, repeatedly directing readers to Ganor-Stern & Tzelgov (2008) for the component model is still suboptimal because the paper by Huber (their Ref 5) constitutes its more recent development.

We now extend the component model explanation with the Huber et al neural network (Intro.: Pages 3-4).

Similarly, the introduction of “button pressure” as the measure of interest (on page 4) is immediately followed by references to “motor metrics”, i.e. kinematic studies, even though the “button pressure” is an isometric assessment without any kinematics involved. In my idiosyncratic view these are suboptimal revisions.

We wanted to show to a general audience that “motor metrics” are used in cognition. We understand that those references were about kinematics. Thus, we now drop those kinematics references to avoid any confusion.

2. More importantly, the methods descriptions are still not detailed enough to permit replication. Although stimulus ranges are now clear and the recording apparatus for “button pressure” is now explained in more detail, the sensitivity of this device is still unclear. If the range is 0.25 to 0.90, does this mean that 65 levels of pressure were discriminable

The range 0.25-0.90 is continuous. There are more than 65 discrete levels.

The actual range from the raw data is 0.025 to 1 (Exp 2) and 0.05 to 1 (Exp. 3). The 0.25 – 0.9 range was after taking averages per numerical distances. In the new manuscript (Data analysis: Page 10) we decided that it was clearer if we present the ranges from the raw data because the right end is 1 and the left range is close to zero.

while the actual force produced (in Pascal or Newton) remains unknown?

The sensor detects pressure as a change in resistance. The details of the resistors and Arduino wiring are in the manuscript and supplemental information. The Arduino transforms those changes of resistance into a discrete signal between 0 and 1023. All subjects used the same Arduino setup (including voltage and resistors). Thus, we do not report Pascals or Newtons (the transformation from change in resistance to Newtons or Pascals is not trivial). Still, the change in resistance indeed relates to force; resistance-based pressure is used in many current devices including mobile phones and many portable devices.

Another open issue, of fundamental importance for the later discussion about early attentional vs later conceptual effects: What was the sampling frequency (i.e. the temporal resolution of pressure measurements)?

We now report the sampling frequency of the Arduino board in the manuscript (Apparatus, page 7-8) (approx. 26 hz but see manuscript for details).

It is not fast enough to fully disentangle early attention and later concepts (holistic magnitudes).

In the discussion we accept that it could be attention, but it would require some ad-hoc accommodations so that all our tables and results are about attention and not confidence.

As for concepts, we do think that confidence results speak about concepts, namely the existence of negative holistic magnitudes (see discussion section). However, we are also aware that our paper only brings partial clarity to negative magnitude processing. Our paper is just about confidence.

3. Moreover, despite my extensive queries on this point, I still find only a SINGLE sentence that describes the force data analysis, namely “Data analysis. We used panel linear regressions to analyze the accuracy (linear probability model), response times up to two standard deviations from the mean (i.e., 95% of the trials), and button pressure”.

We apologize for our lack of clarity; it is not our intention. In the first revision we tried to address all your comments, but we clearly fail. We hope that our responses below clarify our data analysis.

This leaves open fundamental questions such as the time during which force was recorded or integrated,

We analyzed signals as soon as the numerals appeared on screen and until the subject selected an option and was no longer pressing the force sensor (i.e. force resistance < low_pressure_threshold) (this for all trials). (Data Analysis: Page 10).

the data filtering or trimming for this inherently noisy signal,

We did not filter Arduino signals. When the sensor is pressed it sends a reading to Matlab based on the participant’s pressure. In the figure below, each trial has a stereotypical look: a peak. This means that participants had a brief contact with the sensor (here we plot the raw Arduino signal, before any standardization) (see all raw data in Supplemental Figures 18 and 19).

We filtered data on the behavior side. We reported in the manuscript two criteria: response time greater than 2 standard deviations from the mean of all the data and accuracy less than 85%.

We did detect an anomaly thanks to your comment. We plotted all the raw pressure data and we detected one subject who used the keyboard instead of the Arduino sensors. The pressures of subj. 33443 (Exp. 2) were always at baseline levels (Supp. Figure 18). This subj did respond accurately to the task, meaning that they used the keyboard (we explain this on Page 6 of the new manuscript).

The reason this could happen was that we coded the first experiment (Exp. 1) so that subjects had to use the keyboard. In Exp. 2 we used the same base code (plus the Arduino stuff) and even though we verbally explained to all subjects to use the Arduino sensors, we kept, by mistake, the original instructions of Exp. 1 on screen. Subj. 33443 was the only subject (across Exp. 2 and 3) who mistakenly used the keyboard (see Supplemental Figure 18). We excluded this subject from this revision. Importantly, the main pattern of results did not change.

or the computation of parameters for analysis, such as average or peak force per trial, or many other candidates. None of this is explained.

For this revision we now provide a concrete example for further clarity (Data Analysis: Page 10). Specifically, we used for analysis peak force during the trial, normalized to the max peak force during the whole session as a normalizing constant (separately for the left and right sensor). For instance, if subject X max peak force to the left during his whole session was 984, then all left pressures during the session were divided by 984. We used this metric in the tables. That is, the regression parameters were obtained with standardized peak pressure. We now put explicitly in the regressions the name of the variable followed by “normed”.

4. I am unhappy with various minor aspects of this revision: The authors’ claim to defend their sample size of 50 participants “in studies with similar sample sizes (1,2,4,8)” (p. 6) is misleading readers because only Experiment 2 of ref. 4 has 55 participants, while all others have between 16 and 27.

We dropped the expression “similar sample size” to the actual range of 16 to 55. Thus, our sample size is on the larger end.

My request to determine power or sensitivity retrospectively was not addressed.

We calculated power, using response times from our participants as it is known in the literature that confidence and response times correlate (Participants: Page 7). Overall, the obtained power is large.

Some wording is still poor: On page 7 “Exp. 1 was run in a 13-inch laptop” (should be “on”),

Solved.

or “Uncertainty is a more general concept as it is not a conditioned in choice.” (p. 5) is ungrammatical.

We dropped that expression and rather clarify the distinction of uncertainty and confidence in the model because it is more concrete for the reader: DDM model: Page 13: “It is important to highlight that uncertainty UN is not the same as confidence in the model. UN is the base standard deviation of the inference, while confidence is the cumulative probability that μdv is greater than zero after a choice is made (i.e. after dv arrives at one of the thresholds).”

Several references contain the superfluous word “internet”.

We did not realize that Mendeley was doing that, sorry.

OTHER IMPORTANT CLARIFICATIONS

• In Exp. 3, by mistake, we standardized to a value that was not the peak pressure in general but the initial peak value in the Arduino signal greater than a low threshold. We corrected this oversight and that is why some regression estimates of Exp. 3 are different in this current revision. Importantly the overall pattern/direction/significance of the effects is identical, just slightly different estimates (i.e. both constants are correlated because each trial is a time series). This oversight did not affect the conclusions and the corrections, if any, consolidated the overall results.

The oversight happened because in Exp. 3 we decided to change the serial function used by Matlab to communicate with the Arduino (it will be deprecated; the new one is serialport). When we changed the code we marked events slightly differently. During data analyses we missed changing some of the marks and we ended up using, mistakenly, the peak pressure across trials of the first pressure signal sent by the Arduino greater than a low-pressure threshold on each trial. As we mentioned, we corrected this in the current revision.

• In the Table 1 of the previous revision we didn’t include the variable controlling for error feedback. In this revision we fixed that.

• The only tables that include interaction effects are the ones theoretically relevant. In Page 11: “We included interactions in the regressions when it was theoretically relevant. Namely when the dependent variable was response times (distinct slopes by numeral type suggest different mental magnitudes) and pressure (Figure 1, central panel shows distinct slopes for correct and incorrect trials).”

Reviewer #2:

The authors have addressed most of the raised points. In my previous revision I focused on the model, as it is interesting in my opinion and authors have included a wider reflection on this issue

Attachment

Submitted filename: Reviewers_2_PONE.docx

Decision Letter 2

Federico Giove

30 May 2022

PONE-D-21-25521R2

Reduced choice-confidence in negative numerals

PLOS ONE

Dear Dr. Alonso-Diaz,

This is a formal decision, to allow authors to address wrong submission

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 14 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Federico Giove, PhD

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 3;17(10):e0272796. doi: 10.1371/journal.pone.0272796.r006

Author response to Decision Letter 2


31 May 2022

Reviewer #1:

I have read the cover letter and the revised ms plus the supplementary document and my impression is mixed.

1. The authors made several adjustments and additions that strengthen the ms, including explanations of their idiosyncratic terminology (“dummy trials” for filler trials) and references to relevant papers that were previously omitted. However, repeatedly directing readers to Ganor-Stern & Tzelgov (2008) for the component model is still suboptimal because the paper by Huber (their Ref 5) constitutes its more recent development.

We now extend the component model explanation with the Huber et al neural network (Intro.: Pages 3-4).

Similarly, the introduction of “button pressure” as the measure of interest (on page 4) is immediately followed by references to “motor metrics”, i.e. kinematic studies, even though the “button pressure” is an isometric assessment without any kinematics involved. In my idiosyncratic view these are suboptimal revisions.

We wanted to show to a general audience that “motor metrics” are used in cognition. We understand that those references were about kinematics. Thus, we now drop those kinematics references to avoid any confusion.

2. More importantly, the methods descriptions are still not detailed enough to permit replication. Although stimulus ranges are now clear and the recording apparatus for “button pressure” is now explained in more detail, the sensitivity of this device is still unclear. If the range is 0.25 to 0.90, does this mean that 65 levels of pressure were discriminable

The range 0.25-0.90 is continuous. There are more than 65 discrete levels.

The actual range from the raw data is 0.025 to 1 (Exp 2) and 0.05 to 1 (Exp. 3). The 0.25 – 0.9 range was after taking averages per numerical distances. In the new manuscript (Data analysis: Page 10) we decided that it was clearer if we present the ranges from the raw data because the right end is 1 and the left range is close to zero.

while the actual force produced (in Pascal or Newton) remains unknown?

The sensor detects pressure as a change in resistance. The details of the resistors and Arduino wiring are in the manuscript and supplemental information. The Arduino transforms those changes of resistance into a discrete signal between 0 and 1023. All subjects used the same Arduino setup (including voltage and resistors). Thus, we do not report Pascals or Newtons (the transformation from change in resistance to Newtons or Pascals is not trivial). Still, the change in resistance indeed relates to force; resistance-based pressure is used in many current devices including mobile phones and many portable devices.

Another open issue, of fundamental importance for the later discussion about early attentional vs later conceptual effects: What was the sampling frequency (i.e. the temporal resolution of pressure measurements)?

We now report the sampling frequency of the Arduino board in the manuscript (Apparatus, page 7-8) (approx. 26 hz but see manuscript for details).

It is not fast enough to fully disentangle early attention and later concepts (holistic magnitudes).

In the discussion we accept that it could be attention, but it would require some ad-hoc accommodations so that all our tables and results are about attention and not confidence.

As for concepts, we do think that confidence results speak about concepts, namely the existence of negative holistic magnitudes (see discussion section). However, we are also aware that our paper only brings partial clarity to negative magnitude processing. Our paper is just about confidence.

3. Moreover, despite my extensive queries on this point, I still find only a SINGLE sentence that describes the force data analysis, namely “Data analysis. We used panel linear regressions to analyze the accuracy (linear probability model), response times up to two standard deviations from the mean (i.e., 95% of the trials), and button pressure”.

We apologize for our lack of clarity; it is not our intention. In the first revision we tried to address all your comments, but we clearly fail. We hope that our responses below clarify our data analysis.

This leaves open fundamental questions such as the time during which force was recorded or integrated,

We analyzed signals as soon as the numerals appeared on screen and until the subject selected an option and was no longer pressing the force sensor (i.e. force resistance < low_pressure_threshold) (this for all trials). (Data Analysis: Page 10).

the data filtering or trimming for this inherently noisy signal,

We did not filter Arduino signals. When the sensor is pressed it sends a reading to Matlab based on the participant’s pressure. In the figure below, each trial has a stereotypical look: a peak. This means that participants had a brief contact with the sensor (here we plot the raw Arduino signal, before any standardization) (see all raw data in Supplemental Figures 18 and 19).

We filtered data on the behavior side. We reported in the manuscript two criteria: response time greater than 2 standard deviations from the mean of all the data and accuracy less than 85%.

We did detect an anomaly thanks to your comment. We plotted all the raw pressure data and we detected one subject who used the keyboard instead of the Arduino sensors. The pressures of subj. 33443 (Exp. 2) were always at baseline levels (Supp. Figure 18). This subj did respond accurately to the task, meaning that they used the keyboard (we explain this on Page 6 of the new manuscript).

The reason this could happen was that we coded the first experiment (Exp. 1) so that subjects had to use the keyboard. In Exp. 2 we used the same base code (plus the Arduino stuff) and even though we verbally explained to all subjects to use the Arduino sensors, we kept, by mistake, the original instructions of Exp. 1 on screen. Subj. 33443 was the only subject (across Exp. 2 and 3) who mistakenly used the keyboard (see Supplemental Figure 18). We excluded this subject from this revision. Importantly, the main pattern of results did not change.

or the computation of parameters for analysis, such as average or peak force per trial, or many other candidates. None of this is explained.

For this revision we now provide a concrete example for further clarity (Data Analysis: Page 10). Specifically, we used for analysis peak force during the trial, normalized to the max peak force during the whole session as a normalizing constant (separately for the left and right sensor). For instance, if subject X max peak force to the left during his whole session was 984, then all left pressures during the session were divided by 984. We used this metric in the tables. That is, the regression parameters were obtained with standardized peak pressure. We now put explicitly in the regressions the name of the variable followed by “normed”.

4. I am unhappy with various minor aspects of this revision: The authors’ claim to defend their sample size of 50 participants “in studies with similar sample sizes (1,2,4,8)” (p. 6) is misleading readers because only Experiment 2 of ref. 4 has 55 participants, while all others have between 16 and 27.

We dropped the expression “similar sample size” to the actual range of 16 to 55. Thus, our sample size is on the larger end.

My request to determine power or sensitivity retrospectively was not addressed.

We calculated power, using response times from our participants as it is known in the literature that confidence and response times correlate (Participants: Page 7). Overall, the obtained power is large.

Some wording is still poor: On page 7 “Exp. 1 was run in a 13-inch laptop” (should be “on”),

Solved.

or “Uncertainty is a more general concept as it is not a conditioned in choice.” (p. 5) is ungrammatical.

We dropped that expression and rather clarify the distinction of uncertainty and confidence in the model because it is more concrete for the reader: DDM model: Page 13: “It is important to highlight that uncertainty UN is not the same as confidence in the model. UN is the base standard deviation of the inference, while confidence is the cumulative probability that μdv is greater than zero after a choice is made (i.e. after dv arrives at one of the thresholds).”

Several references contain the superfluous word “internet”.

We did not realize that Mendeley was doing that, sorry.

OTHER IMPORTANT CLARIFICATIONS

• In Exp. 3, by mistake, we standardized to a value that was not the peak pressure in general but the initial peak value in the Arduino signal greater than a low threshold. We corrected this oversight and that is why some regression estimates of Exp. 3 are different in this current revision. Importantly the overall pattern/direction/significance of the effects is identical, just slightly different estimates (i.e. both constants are correlated because each trial is a time series). This oversight did not affect the conclusions and the corrections, if any, consolidated the overall results.

The oversight happened because in Exp. 3 we decided to change the serial function used by Matlab to communicate with the Arduino (it will be deprecated; the new one is serialport). When we changed the code we marked events slightly differently. During data analyses we missed changing some of the marks and we ended up using, mistakenly, the peak pressure across trials of the first pressure signal sent by the Arduino greater than a low-pressure threshold on each trial. As we mentioned, we corrected this in the current revision.

• In the Table 1 of the previous revision we didn’t include the variable controlling for error feedback. In this revision we fixed that.

• The only tables that include interaction effects are the ones theoretically relevant. In Page 11: “We included interactions in the regressions when it was theoretically relevant. Namely when the dependent variable was response times (distinct slopes by numeral type suggest different mental magnitudes) and pressure (Figure 1, central panel shows distinct slopes for correct and incorrect trials).”

• The manuscript is on psyarxiv with figures on the manuscript (https://psyarxiv.com/sdfb9/)

Reviewer #2:

The authors have addressed most of the raised points. In my previous revision I focused on the model, as it is interesting in my opinion and authors have included a wider reflection on this issue

Attachment

Submitted filename: Reviewers_2_PONE.docx

Decision Letter 3

Federico Giove

29 Jun 2022

PONE-D-21-25521R3Reduced choice-confidence in negative numeralsPLOS ONE

Dear Dr. Alonso-Diaz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

One of the original reviewers (#1) suggested rejection, based on unsatisfactorily response to his/her comments. I involved a third reviewer. While generally praising your work, reviewer #3 raised a number of further concerns.I encourage the authors to address all the new criticism raised by reviewer #3, as well as trying to address the  residual part of comments of #1. Please include a response to reviewer covering also #1.

Please submit your revised manuscript by Aug 13 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Federico Giove, PhD

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: This manuscript deals with a question that, in my opinion, looks highly relevant for many applications: do negative numbers induce more uncertainty than positive ones? It does so by resorting to different strategies: hardware that allows graded responses in the form of pressure, cognitive modelling with the diffusion model… In general, I like the idea behind this research. However, I had problems to fully understand what the manuscript is describing, in part because the text organization is not clear.

I will describe next a few comments that may help to improve the manuscript:

1. Text organization

I found the Introduction well written and easy to follow. Not being an expert myself, I think that the Introduction contains the basic information needed to understand the purpose and justification of the studies.

The rest of the manuscript is organized in a way that, in my opinion, does not help to understand what is being described:

-The three experiments are described at the same time, which makes it difficult to understand the differences between them and why one needs to conduct three experiments at all. A more traditional organization with each experiment being described (together with its justification) alone would be better, in my opinion. If the experiments have very similar procedure, you could just refer to that explanation in previous sections. Now the motivation to run Experiment 3, for instance, seems completely overlooked (it is mentioned somewhere at the beginning and it seems to never matter again).

-The same happens with the Data Analysis, Drift-Diffusion model, and Results sections. It is hard to keep in mind all the information when one goes through these sections. The Results section, for instance, is just a collection of tables, full of numbers, with little guidance in the text. I would prefer a more traditional organization: experiment-wise, with a description of the data analyses that will be later reported (and explained!) in the Results section. Maybe, too, a final section for the modelling that could include the three experiments, if you want to. Or perhaps the diffusion model can be described in the first Data Analysis section, and then in subsequent subsections of the Results sections for each experiment. Now it is all a bit too mixed up.

2. Theoretical implications.

Admittedly, I am not an expert on numerical cognition. From my reading of the Introduction, I got that there are two competing theories that make different predictions concerning negative numbers. But it is not clear to me whether and how the current studies help in addressing these questions. Would it be possible to connect the results to these theories, perhaps favouring one over the other? (It seems that the diffusion model results indicate that differences between numbers are not an encoding effect)

My impression is that, despite the highly interesting theoretical debate, the results of these experiments are just descriptive: they suggest that there is an uncertainty burden in negative numbers, at least in this task. Thus, we cannot advance too much in theory without further research.

I was curious about a potential extension of these results to different tasks. Would negative numbers produce more uncertain answers in any type of task? For instance: Here, the task is very simple and implies recognizing negative numbers. I don’t know if the result is generalizable to production tasks. What would happen if, facing any numerical task (solving mathematical problems, or just emitting judgments), those responses that are negative produce more uncertainty as well?

3. Minor comments:

-Power analyses. The authors conduct post hoc power analyses. This is not recommended (Althouse, 2021; Hoenig & Heisy, 2001). Post hoc power estimations are *determined* by the p-value. So, once you get a significant result, what is the point in computing power?

What can we do, then? Ideally, power calculations should be conducted *before* data collection (“a priori power analyses”), so that you determine you sample size given an estimated effect size and a desired power level. However, this is not easy to do as the effect size must still be estimated (which is hard to do, as the literature is often biased).

Thus, I assume that you did not conduct power analyses a priori. Then, what you can do is conducting sensitivity analyses. Sensitivity analyses can be conducted a priori or a posteriori, and they reach a compromise between power goals and practical issues. You can use G*Power or any other software to: (1) decide which sample size you are willing to collect (or you have already collected) for practical/ethical/economic reasons, (2) fix a power level (e.g., 80%), and (3) solve for the minimal effect size that you can detect.

If the sensitivity analyses reveal that you can detect reliably (with good power) even small effects, then your study is well powered for those effects. If you can only detect large effects, that means that you have low power for small effects.

Another thing you can do is reporting effect sizes and confidence intervals. Even if the results are significant, a large interval suggests that the study is not informative.

-Lots of t-tests were conducted (e.g. with diffusion model parameters, Table 5) without (as far as I can see) multiple contrasts protection.

- P. 18: “However, the term…” (do you mean “interaction term”?)

References:

Althouse A. D. (2021). Post Hoc Power: Not Empowering, Just Misleading. The Journal of surgical research, 259, A3–A6. https://doi.org/10.1016/j.jss.2019.10.049

Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001; 55:19-24.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 3;17(10):e0272796. doi: 10.1371/journal.pone.0272796.r008

Author response to Decision Letter 3


11 Jul 2022

#REVIEWER 3

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: This manuscript deals with a question that, in my opinion, looks highly relevant for many applications: do negative numbers induce more uncertainty than positive ones? It does so by resorting to different strategies: hardware that allows graded responses in the form of pressure, cognitive modelling with the diffusion model… In general, I like the idea behind this research. However, I had problems to fully understand what the manuscript is describing, in part because the text organization is not clear.

Response: Thanks for your positive words, we appreciate them. We hope that we address your concerns in this revised manuscript.

I will describe next a few comments that may help to improve the manuscript:

1. Text organization

I found the Introduction well written and easy to follow. Not being an expert myself, I think that the Introduction contains the basic information needed to understand the purpose and justification of the studies.

The rest of the manuscript is organized in a way that, in my opinion, does not help to understand what is being described:

-The three experiments are described at the same time, which makes it difficult to understand the differences between them and why one needs to conduct three experiments at all. A more traditional organization with each experiment being described (together with its justification) alone would be better, in my opinion. If the experiments have very similar procedure, you could just refer to that explanation in previous sections. Now the motivation to run Experiment 3, for instance, seems completely overlooked (it is mentioned somewhere at the beginning and it seems to never matter again).

Response: In the new manuscript we separated each experiment into different sections. Now the motivation for each one is explained clearly at the beginning of each one. Thus, Exp. 1 is traditional keyboard experiment that allow us to show some confidence predictions from the drift-diffusion model (DDM). Exp. 2 and 3 confirm one of those predictions and replicates the DDM results. Exp. 3 further controls for important methodological aspects.

-The same happens with the Data Analysis, Drift-Diffusion model, and Results sections. It is hard to keep in mind all the information when one goes through these sections. The Results section, for instance, is just a collection of tables, full of numbers, with little guidance in the text. I would prefer a more traditional organization: experiment-wise, with a description of the data analyses that will be later reported (and explained!) in the Results section.

Response: Now each experiment has its own data analysis and results sections. The drift diffusion is only explained in Exp. 1. By separating each experiment we think that now the explanations are clearer.

Maybe, too, a final section for the modelling that could include the three experiments, if you want to. Or perhaps the diffusion model can be described in the first Data Analysis section, and then in subsequent subsections of the Results sections for each experiment. Now it is all a bit too mixed up.

Response: We now describe the drift diffusion model in the first Data Analysis (in Exp. 1) and then presented the results for each experiment on their own subsection.

2. Theoretical implications.

Admittedly, I am not an expert on numerical cognition. From my reading of the Introduction, I got that there are two competing theories that make different predictions concerning negative numbers. But it is not clear to me whether and how the current studies help in addressing these questions. Would it be possible to connect the results to these theories, perhaps favouring one over the other? (It seems that the diffusion model results indicate that differences between numbers are not an encoding effect)

Response: The results indeed point to holistic representations of numbers, not merely encoding. The diffusion model, and the response times patterns in Exp. 1 and 2, surely suggests that. In the discussion section we are now more explicit on saying this (Page 32.)

My impression is that, despite the highly interesting theoretical debate, the results of these experiments are just descriptive: they suggest that there is an uncertainty burden in negative numbers, at least in this task. Thus, we cannot advance too much in theory without further research.

Response: Indeed, button pressure alone cannot disentangle which theory is correct (at least with our design). We mention this in Page 32. Still response times and the drift diffusion model favor holistic representations and we mention this also in Page 32. However, given the assumptions behind the drift-diffusion model we are still conservative and mention that further research is required.

I was curious about a potential extension of these results to different tasks. Would negative numbers produce more uncertain answers in any type of task? For instance: Here, the task is very simple and implies recognizing negative numbers. I don’t know if the result is generalizable to production tasks. What would happen if, facing any numerical task (solving mathematical problems, or just emitting judgments), those responses that are negative produce more uncertainty as well?

Response: in this new revision the last phrase is suggesting such critical future questions.

3. Minor comments:

-Power analyses. The authors conduct post hoc power analyses. This is not recommended (Althouse, 2021; Hoenig & Heisy, 2001). Post hoc power estimations are *determined* by the p-value. So, once you get a significant result, what is the point in computing power?

What can we do, then? Ideally, power calculations should be conducted *before* data collection (“a priori power analyses”), so that you determine you sample size given an estimated effect size and a desired power level. However, this is not easy to do as the effect size must still be estimated (which is hard to do, as the literature is often biased).

Thus, I assume that you did not conduct power analyses a priori. Then, what you can do is conducting sensitivity analyses. Sensitivity analyses can be conducted a priori or a posteriori, and they reach a compromise between power goals and practical issues. You can use G*Power or any other software to: (1) decide which sample size you are willing to collect (or you have already collected) for practical/ethical/economic reasons, (2) fix a power level (e.g., 80%), and (3) solve for the minimal effect size that you can detect.

If the sensitivity analyses reveal that you can detect reliably (with good power) even small effects, then your study is well powered for those effects. If you can only detect large effects, that means that you have low power for small effects.

Another thing you can do is reporting effect sizes and confidence intervals. Even if the results are significant, a large interval suggests that the study is not informative.

Response: We fully agree with you. In the original manuscript we did not do post-hoc justifications. As the review process advanced, we were trying to understand the requirements of previous reviewers. We agree with you that the sensitivity analysis is a good middle ground. We now provide a sensitivity analysis for Exp. 1. For Exp. 2 we mention that we preregistered a max. sample size and that Exp. 3 also follow that cap.

-Lots of t-tests were conducted (e.g. with diffusion model parameters, Table 5) without (as far as I can see) multiple contrasts protection.

Response: We corrected for multiple comparisons with Holm-Sidak (page 12)

- P. 18: “However, the term…” (do you mean “interaction term”?)

Response: Yes, you are right, we corrected that.

References:

Althouse A. D. (2021). Post Hoc Power: Not Empowering, Just Misleading. The Journal of surgical research, 259, A3–A6. https://doi.org/10.1016/j.jss.2019.10.049

Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001; 55:19-24.

________________________________________

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

________________________________________

Reviewer #1:

NOTE: THIS IS FROM THE PREVIOUS ROUND (WITH MINOR EDITS). WE INCLUDE THE BY THE EDITOR’S REQUEST.

1. The authors made several adjustments and additions that strengthen the ms, including explanations of their idiosyncratic terminology (“dummy trials” for filler trials) and references to relevant papers that were previously omitted. However, repeatedly directing readers to Ganor-Stern & Tzelgov (2008) for the component model is still suboptimal because the paper by Huber (their Ref 5) constitutes its more recent development.

We now extend the component model explanation with the Huber et al neural network (Intro.: Pages 3-4).

Similarly, the introduction of “button pressure” as the measure of interest (on page 4) is immediately followed by references to “motor metrics”, i.e. kinematic studies, even though the “button pressure” is an isometric assessment without any kinematics involved. In my idiosyncratic view these are suboptimal revisions.

We wanted to show to a general audience that “motor metrics” are used in cognition. We understand that those references were about kinematics. Thus, we now drop those kinematics references to avoid any confusion.

2. More importantly, the methods descriptions are still not detailed enough to permit replication. Although stimulus ranges are now clear and the recording apparatus for “button pressure” is now explained in more detail, the sensitivity of this device is still unclear. If the range is 0.25 to 0.90, does this mean that 65 levels of pressure were discriminable

The range 0.25-0.90 is continuous. There are more than 65 discrete levels.

The actual range from the raw data is 0.025 to 1 (Exp 2) and 0.05 to 1 (Exp. 3). The 0.25 – 0.9 range was after taking averages per numerical distances. In the new manuscript (Data analysis) we decided that it was clearer if we present the ranges from the raw data because the right end is 1 and the left range is close to zero.

while the actual force produced (in Pascal or Newton) remains unknown?

The sensor detects pressure as a change in resistance. The details of the resistors and Arduino wiring are in the manuscript and supplemental information. The Arduino transforms those changes of resistance into a discrete signal between 0 and 1023. All subjects used the same Arduino setup (including voltage and resistors). Thus, we do not report Pascals or Newtons (the transformation from change in resistance to Newtons or Pascals is not trivial). Still, the change in resistance indeed relates to force; resistance-based pressure is used in many current devices including mobile phones and many portable devices.

Another open issue, of fundamental importance for the later discussion about early attentional vs later conceptual effects: What was the sampling frequency (i.e. the temporal resolution of pressure measurements)?

We now report the sampling frequency of the Arduino board in the manuscript (Apparatus, page 7-8) (approx. 26 hz but see manuscript for details).

It is not fast enough to fully disentangle early attention and later concepts (holistic magnitudes).

In the discussion we accept that it could be attention, but it would require some ad-hoc accommodations so that all our tables and results are about attention and not confidence.

we do think that confidence results speak about concepts, namely the existence of negative holistic magnitudes (see discussion section). However, we are also aware that our paper only brings partial clarity to negative magnitude processing. Our paper is just about confidence.

3. Moreover, despite my extensive queries on this point, I still find only a SINGLE sentence that describes the force data analysis, namely “Data analysis. We used panel linear regressions to analyze the accuracy (linear probability model), response times up to two standard deviations from the mean (i.e., 95% of the trials), and button pressure”.

We apologize for our lack of clarity; it is not our intention. In the first revision we tried to address all your comments, but we clearly fail. We hope that our responses below clarify our data analysis.

This leaves open fundamental questions such as the time during which force was recorded or integrated,

We analyzed signals as soon as the numerals appeared on screen and until the subject selected an option and was no longer pressing the force sensor (i.e. force resistance < low_pressure_threshold) (this for all trials). (Data Analysis).

the data filtering or trimming for this inherently noisy signal,

We did not filter Arduino signals. When the sensor is pressed it sends a reading to Matlab based on the participant’s pressure. In the figure below (see the .docx document rather than the Editorial Manager pdf), each trial has a stereotypical look: a peak. This means that participants had a brief contact with the sensor (here we plot the raw Arduino signal, before any standardization) (see all raw data in Supplemental Figures 18 and 19).

We filtered data on the behavior side. We reported in the manuscript two criteria: response time greater than 2 standard deviations from the mean of all the data and accuracy less than 85%.

We did detect an anomaly thanks to your comment. We plotted all the raw pressure data and we detected one subject who used the keyboard instead of the Arduino sensors. The pressures of subj. 33443 (Exp. 2) were always at baseline levels (Supp. Figure 18). This subj did respond accurately to the task, meaning that they used the keyboard (we explain this on the participants section of the new manuscript).

The reason this could happen was that we coded the first experiment (Exp. 1) so that subjects had to use the keyboard. In Exp. 2 we used the same base code (plus the Arduino stuff) and even though we verbally explained to all subjects to use the Arduino sensors, we kept, by mistake, the original instructions of Exp. 1 on screen. Subj. 33443 was the only subject (across Exp. 2 and 3) who mistakenly used the keyboard (see Supplemental Figure 18). We excluded this subject from this revision. Importantly, the main pattern of results did not change.

or the computation of parameters for analysis, such as average or peak force per trial, or many other candidates. None of this is explained.

For this revision we now provide a concrete example for further clarity (Data Analysis section). Specifically, we used for analysis peak force during the trial, normalized to the max peak force during the whole session as a normalizing constant (separately for the left and right sensor). For instance, if subject X max peak force to the left during his whole session was 984, then all left pressures during the session were divided by 984. We used this metric in the tables. That is, the regression parameters were obtained with standardized peak pressure. We now put explicitly in the regressions the name of the variable followed by “normed”.

4. I am unhappy with various minor aspects of this revision: The authors’ claim to defend their sample size of 50 participants “in studies with similar sample sizes (1,2,4,8)” (p. 6) is misleading readers because only Experiment 2 of ref. 4 has 55 participants, while all others have between 16 and 27.

We dropped the expression “similar sample size” to the actual range of 16 to 55. Thus, our sample size is on the larger end.

My request to determine power or sensitivity retrospectively was not addressed.

We calculated power, using response times from our participants as it is known in the literature that confidence and response times correlate (Participants: Page 7). Overall, the obtained power is large.

Some wording is still poor: On page 7 “Exp. 1 was run in a 13-inch laptop” (should be “on”),

Solved.

or “Uncertainty is a more general concept as it is not a conditioned in choice.” (p. 5) is ungrammatical.

We dropped that expression and rather clarify the distinction of uncertainty and confidence in the model because it is more concrete for the reader: DDM model: Page 13: “It is important to highlight that uncertainty UN is not the same as confidence in the model. UN is the base standard deviation of the inference, while confidence is the cumulative probability that μdv is greater than zero after a choice is made (i.e. after dv arrives at one of the thresholds).”

Several references contain the superfluous word “internet”.

We did not realize that Mendeley was doing that, sorry.

OTHER IMPORTANT CLARIFICATIONS

• In Exp. 3, by mistake, we standardized to a value that was not the peak pressure in general but the initial peak value in the Arduino signal greater than a low threshold. We corrected this oversight and that is why some regression estimates of Exp. 3 are different in this current revision. Importantly the overall pattern/direction/significance of the effects is identical, just slightly different estimates (i.e. both constants are correlated because each trial is a time series). This oversight did not affect the conclusions.

The oversight happened because in Exp. 3 we decided to change the serial function used by Matlab to communicate with the Arduino (it will be deprecated; the new one is serialport). Exp. 3 was the experiment that reviewer 1 asked i.e. it was a new experiment and when we changed the code to include the new conditions and Matlab function, the code we wrote marked events slightly differently. During data analyses we missed changing some of the marks and we ended up using, mistakenly, the peak pressure across trials of the first pressure signal sent by the Arduino greater than a low-pressure threshold on each trial. As we mentioned, we corrected this in the current revision.

• In the Table 1 of the previous revision we didn’t include the variable controlling for error feedback. In this revision we fixed that.

• The only tables that include interaction effects are the ones theoretically relevant. In Page 11: “We included interactions in the regressions when it was theoretically relevant. Namely when the dependent variable was response times (distinct slopes by numeral type suggest different mental magnitudes) and pressure (Figure 1, central panel shows distinct slopes for correct and incorrect trials).”

• The manuscript is on psyarxiv with figures on the manuscript (https://psyarxiv.com/sdfb9/)

Attachment

Submitted filename: Reviewers_3_PONE.docx

Decision Letter 4

Federico Giove

27 Jul 2022

Reduced choice-confidence in negative numerals

PONE-D-21-25521R4

Dear Dr. Alonso-Diaz,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Federico Giove, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: I think the revised manuscript addresses my previous comments. Specifically:

-The text organization is much better, which makes it easier to read.

-The purpose of the experiments is clearer.

-The connection to previous literature/hypotheses is also better explained.

Still, I am not sure that I fully understand the statistical analyses and some other technical details about the pressure measure (that other reviewers seem to find important), so although I think that they are sound, please take my comments with a grain of salt.

Minor comments:

-When reporting sensitivity analyses, you fix alpha (the error rate) to 0.05, rather than “p” as you describe: “we did a sensitivity analysis to check for the required effect size given an 80% power, p<0.05, …”. The p-value is a statistic you obtain from the data and it is not fixed beforehand.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

**********

Acceptance letter

Federico Giove

1 Aug 2022

PONE-D-21-25521R4

Reduced choice-confidence in negative numerals

Dear Dr. Alonso-Diaz:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Federico Giove

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (DOCX)

    Attachment

    Submitted filename: Reviewers_1_PONE.docx

    Attachment

    Submitted filename: Reviewers_2_PONE.docx

    Attachment

    Submitted filename: Reviewers_2_PONE.docx

    Attachment

    Submitted filename: Reviewers_3_PONE.docx

    Data Availability Statement

    Data is available in the Open Science Framework (DOI https://osf.io/yuvz5/).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES