Skip to main content
PLOS One logoLink to PLOS One
. 2022 Mar 16;17(3):e0264830. doi: 10.1371/journal.pone.0264830

Self-selected interval judgments compared to point judgments: A weight judgment experiment in the presence of the size-weight illusion

Nichel Gonzalez 1,*, Ola Svenson 1,2, Magnus Ekström 3,4, Bengt Kriström 5, Mats E Nilsson 1
Editor: Piers D L Howe6
PMCID: PMC8926213  PMID: 35294471

Abstract

Measurements of human attitudes and perceptions have traditionally used numerical point judgments. In the present study, we compared conventional point estimates of weight with an interval judgment method. Participants were allowed to make step by step judgments, successively converging towards their best estimate. Participants estimated, in grams, the weight of differently sized boxes, estimates thus susceptible to the size-weight illusion. The illusion makes the smaller of two objects of the same weight, differing only in size, to be perceived as heavier. The self-selected interval method entails participants judging a highest and lowest reasonable value for the true weight. This is followed by a splitting procedure, consecutive choices of selecting the upper or lower half of the interval the individual estimates most likely to include the true value. Compared to point estimates, interval midpoints showed less variability and reduced the size-weight illusion, but only to a limited extent. Accuracy improvements from the interval method were limited, but the between participant variation suggests that the method has merit.

Introduction

Most measurements include some degree of error, but often it is possible to find an interval that most likely includes the true value. To illustrate, the speed of light has been measured since 1870. Initially, these measurements were given as rather wide intervals, illustrating the uncertainty involved. The intervals became successively smaller and converged towards the current speed of light estimate [1]. Similar to natural science measurement, human judgments also include varying degrees of uncertainty. In the present study, we used an interval judgment method where participants were allowed to make step by step judgments, successively converging towards their best estimate. Hence, the interval judgment method used in this study is conceptually similar to the one describing the physicists’ measurements of speed of light which became more and more exact over the years.

We used an interval judgment method, called self-selected intervals [2, 3], and compared the result with traditional point judgments. The judgment task used for comparisons of the judgment methods was a weight judgment task. The judged weights were boxes of different sizes, thereby creating the size-weight illusion (SWI). To clarify, SWI refers to the fact that if two similar objects have the same weight but are of different size, the smaller of the two will be perceived as heavier, and this illusion is a very robust phenomenon [4]. The inclusion of SWI in our experiment allowed us to test, not only the accuracy of judgments, but also if the interval method reduces the illusion. Furthermore, including the SWI adds an element of perceptual variability within individuals, and we explored if the interval method reduces this kind of variability. Traditionally, researchers have tried to avoid the stimulus-error [5], by for example using magnitude estimation. However, this study aims to find out if the interval method can improve accuracy of judgments, in addition to reducing the SWI as well as obtaining more stable estimates. Hence, we have measured a judgment of the true state of the world (objective weight), and such judgments include prior information/knowledge of weight. This means that the accuracy of judgments in this study includes deliberate cognition, while the SWI occurs involuntary cognition on a perceptual level. This way, it was possible to test the effect of self-selected interval judgments on both a perceptual illusion (SWI) and accuracy.

Self-selected intervals

The rationale for using the method of self-selected intervals is that people are not necessarily certain about quantities they may need to estimate, and may over time provide varying judgments of the same object due to uncertainty and noise in their perceptions. By letting participants select the boundaries of uncertainty freely, a more accurate measure of their true experience may be obtained [3].

To clarify the terminology in the context of this study, people estimate the weight of an object, and by reporting that estimate they have made a judgment. In other words, an estimate is what a person perceives and/or thinks, and a judgment is what that person explicitly reports as their estimate. The self-selected intervals (SSI) method has previously been investigated in studies asking for willingness to pay (WTP) [3] and the statistical properties of the method have been investigated as well [2, 6]. However, SSI has not previously been investigated with a task that has an objectively measurable and correct answer. Because weight is objectively measurable in grams, while WTP is subjectively determined by each individual, additional insights can be gained. Furthermore, we have added the size-weight illusion to the design, which adds a perceptual layer to the task. This enables us to further understand the usefulness of the self-selected intervals.

The self-selected interval judgments process was designed as follows. A participant estimated the weight of a box in grams, initially providing a numerical interval that the participant felt that it included all reasonable values of true weight. In the next step, this interval, for example, 200 to 400 grams was split in two equally wide intervals, 200–300 and 300–400 in this example. The participant was asked to choose which of the new smaller intervals was more likely to include the true value. The procedure of splitting the interval in half was repeated three times to successively converge towards the participant’s best estimate of the box’s true weight. Hence, the final interval midpoint should be a participant’s best estimate, but this point was never judged explicitly by the participants.

Other previous research, using other types of interval judgments, has aimed at indicating how confident a participant was in providing the correct answer. With this type of intervals, if a participant states intervals with 90% confidence, 90% of the intervals should include the true value [7]. Teigen and Jørgensen asked participants to give intervals which should include the correct answer of general knowledge items with some degree of confidence, e.g., 90%. In general, such intervals are found to be too narrow, illustrating that people are more confident about their knowledge than motivated. When [8] replicated this result they also found that participants produced intervals that were very similar even though the levels of confidence were different. This led to severe overconfidence for 90% intervals and less overconfidence for 50% intervals. When participants were asked to produce intervals without a specified confidence level overconfidence was reduced, but not eliminated. Furthermore, participants judged all outcomes that fell within an interval given by others (experts) as equally likely to be correct. This shows that interval judgments are not easy for people to calibrate in respect to their knowledge, but they are a powerful tool for understanding human judgment. In this study, our interval method used the midpoints of successively converging intervals to determine a participant’s best estimate. This mitigates the effect of overconfidence leading to narrow intervals. To exemplify, the judgments 400g and 200g have the midpoint 300g and even if a more confident person would instead answer with 350g to 250g, the midpoint would still be 300g.

The size-weight illusion

Augustine Charpentier [9] was the first to investigate the SWI experimentally. As an example of how strong the illusion can be, Charpentier showed that for two copper balls weighing 266 grams each, 200 grams had to be added to the larger ball to make it feel that it had the same weight as the smaller ball [9]. The illusion persists even when objects are held without being seen [10] or pushed instead of held or lifted [11], but the strength of the illusion can be moderated by the style of lifting [12]. The SWI has been thoroughly investigated in many ways, for example by measuring expectations of size or volume without touching the objects [13], and has been suggested to arise due to competing prior beliefs about relative volume and density between objects [14].

Most SWI-studies use some subjective measure of heaviness (e.g. magnitude estimation), see Saccone, Landry, & Chouinard (2019) for the first meta-analysis on the topic [4]. In our investigation of self-selected intervals, we asked the participants about the true weight of different boxes, rather than their subjective perception of weight. By asking for the weight in grams, we could test if interval judgment midpoints are more accurate than point estimates. In addition, we can find if the intervals will include the true values. We were also able to test if the SWI holds for estimates of true weight.

SWI has previously been found to hold for judgments of true weight in an exercise study [15], and we will try to replicate that finding as well. Furthermore, the SWI is a reliable illusion to include in an experiment of method comparisons because it is resistant to factors such as whether or not the weights are lifted simultaneously [16], and adjustments of the sensorimotor system due to repeated lifting [17, 18]. Furthermore, focusing on weight, rather than size, has been found to increase the illusion [19]. By asking for the true weight of objects participants should have this as their primary focus and give a strong illusion. This should facilitate finding potential differences between the methods.

Summary of study aims

The overarching purpose of the present study was to investigate how well self-selected intervals can be applied to a weight judgment task, and compare it to the traditional method of point judgments. First, we wanted to verify that the size-weight illusion applies to judgments of physical weight in grams in the same way as it applies to pure weight perception measured with magnitude estimation methods. Then, we wanted to compare the accuracy of estimates generated by the interval and point judgments; which method generates estimates closest to the true weight of an object? We also wanted to compare the strength of the illusion across the two different judgment methods. Finally, we wanted to find out if the self-selected interval method can help participants to converge towards their true best estimate. We wanted to know if the intervals, and the consecutive splits, result in more consistent estimates within each individual than do point judgments. In general, we showed in the experiment of this study that the interval method may have some benefits for measurements of human judgments of weight, and can reduce the size-weight illusion in some cases, but that there are limitations to the potential benefits of the method.

Materials and methods

Participants

A total of 32 participants were recruited from Studentkaninen.se (a participant pool for experiment participation at Stockholm University) and among students at the Department of Psychology, Stockholm University. The average age was 31 years (SD = 9). There were 15 men and 17 women, of whom 28 were right-handed and 3 left-handed; one participant did not report a dominant hand. The experiment involved two experimental conditions, one for each judgment task, point judgments and self-selected interval judgments. The two conditions were separated by at least two days. One participant did not show up for the second condition and was excluded from the analysis. Participants were rewarded SEK 200 (about € 20) in vouchers that could be used in a wide variety of stores. Furthermore, there was an extra reward of SEK 500 (about € 50) in vouchers to the participant who gave estimates closest to the correct weights, and the second closest participant was rewarded an extra SEK 200.

Design

The experiment had a factorial design with the within-subject factors size of box (3 levels) and weight of box (5 levels per box). The weights were judged with point estimates and self-selected intervals on separate days. A detailed description follows in the sections: stimuli, judgment methods, and procedure.

Stimuli

The stimuli consisted of 15 cardboard boxes of 3 different sizes, thus 5 boxes of each size. The boxes were (approximate) cubes with side length (cm) of 6.8 (small box), 12.3 (medium box) and 22.4 cm (large box). Counting the lid of the boxes the heights were 7.4 (small box), 12.8 (medium box) and 22.8 (large box). The size of the boxes made it impossible to make the heaviest small box as heavy as the heaviest large box. For this reason, only weight ranks 3–5 were represented in all three sizes. To center the entire stimulus set around a parameter the density 0.33 was represented in all three sizes, although density was not a main research interest for this study. Table 1 gives weight and density of each box, see also photo in Fig 1. Densities are calculated by using the volume including the height from the lid (e.g. 6.8*6.8*7.4 for the small box).

Table 1. Weight and density of the boxes used as stimuli in the experiment.

Weight [g] Density [g/cm3]
Weight rank Small box Medium box Large box Small box Medium box Large box
1 114 0.33
2 235 235 0.68 0.12
3 455 455 455 1.31 0.23 0.04
4 649 649 649 1.87 0.33 0.06
5 1140 1140 1140 3.29 0.58 0.10
6 2199 2199 1.12 0.19
7 3808 0.33

The density was the same for three of the boxes, the lightest small box, the medium heavy medium box and the heaviest large box, illustrated by bold density numbers.

Fig 1. Photograph of the arrangement of the 15 boxes before the first session.

Fig 1

Participants were asked to judge the weight of the mug, shown in the photo, before the experiment as an introduction to the lifting task. The experimenter sat behind the computer screen at the back and recorded the responses from the participant as they estimated the weight of the boxes one by one, starting with the medium box at the top right of the photo.

To increase the weight of the boxes, balance weights were glued alongside the inside walls in a symmetrical pattern. By spreading out the weights along the walls in a symmetrical pattern the boxes felt as a uniform weight, even when tilted slightly. The balance weights were 5g and 10g cuboid pieces of iron, and strung together by adhesive tape. To increase the stability of the large boxes, wooden sticks were placed as a cross, horizontally, in mid height of the box. The large boxes were further reinforced with duct tape on the inside of the corners. To adjust box weights with the precision of a tenth of a gram the boxes were filled with cotton. In cases where the final 5g and 10g weights could not be symmetrically fit on the walls, single weights were embedded in the center of the cotton (e.g. if 10g was needed it could only be split up on 2 x 5g weights that could not be split symmetrically around 4 walls). See Fig 2 for an illustration of the boxes’ construction.

Fig 2. Photograph of the inside of the boxes and the added weights.

Fig 2

The photograph illustrates examples of how the box weights were adjusted in each of the three different sizes used in our experiment. Single 10g and 5g weights can be seen on top of the lid at the top of the photo.

Judgment methods

The weights of the boxes were estimated by the participants and reported by point judgments and self-selected interval judgments. Each judgment method was conducted in a separate session. The instruction for point judgment was simply to judge the true weight of the boxes in grams or kilograms.

The instruction for self-selected interval judgment was to first give an interval (two numbers) representing the smallest and the largest weight the box reasonably could have (i.e. it would be unreasonable to the participant that the true weight was outside the limits of the interval), and then answer three follow-up questions. The first follow up question asked if the box weighed less or more than the arithmetic midpoint of the first interval. The binary answer to this question was used to define a second interval half the width of the first interval, and this procedure was repeated two more times to generate a third and a fourth interval. The midpoint of the fourth and final interval was used as the estimate of a participant’s best guess. This example illustrates the procedure:

  1. A participant judged the weight of a given box to be between 100 and 200 g. Thus, the first interval was [100, 200].

  2. The participant was then asked whether the box was lighter or heavier than 150 grams (midpoint of first interval), and answered ‘heavier’. From this answer, a second interval was derived: [150, 200].

  3. The participant was then asked whether the box was lighter or heavier than 175 grams (midpoint of second interval), and answered ‘lighter’. From this, a third interval was derived: [150, 175].

  4. Finally, the participant was asked whether the box was lighter or heavier than 162 grams (midpoint of third interval, rounded to closest even integer), and answered ‘heavier’. From this, a fourth interval was derived: [162, 175].

Procedure

Prior to the experiment, the participants signed an informed consent to participate in the study. As a general introduction to the experiment, the participant was asked to pick up a mug (249g) from the table and assess its weight. Furthermore, this gave us an indication of whether a point or interval judgment would be most natural for weight to report estimates (it turned out that the large majority gave a point judgment).

The participants were then given a written instruction describing the box weight estimation procedure. There was one instruction for the point judgment task and another instruction for the self-selected interval judgment task.

For the lifting part of the procedure, the participants were instructed to first lift the boxes with two hands, and then place the box in the palm of one hand. The experiment leader illustrated a motion where the box was lifted and held with the elbows at approximately a 90-degree angle, for both the two hands and one hand grip. After holding the box with two and then one hand, as instructed, the participants were allowed to hold the box in any way (and how long) they liked. The only restrictions for handling the box was that it was not allowed to shake, or tilt, the box in a way that could compromise the integrity of the box, or open the box. While holding the box they gave their estimate of the box’s weight as either a point or an interval.

The 15 boxes were randomized into three different sequences (orders of the boxes). There were three sessions, one sequence per session. The first sequence of 15 boxes was set up before the participant entered the experiment room for the first time. After judging each of the 15 boxes once, in the order of the presented sequence, the participant was asked to leave the room while the experimenter set up the second sequence of the same 15 boxes. However, the participants were not informed whether or not it would be the same set of boxes for the second session. The participant then entered the room and for the second sequence. This was repeated for the third and final session. The three session sequences were the same for the point and interval conditions.

In the point estimate condition, the experimenter entered the judgment into the computer and asked the participant to move on to the next box. In the interval condition, the experimenter entered the interval judgments. After entering the interval judgments, the computer returned the midpoint of the interval (arithmetic mean rounded to the closest even integer) and the experimenter asked the participant if the box was lighter or heavier than this midpoint. The experimenter entered the answer into the computer, which then, again, returned the midpoint of the new, halved, interval and the procedure was repeated.

The participants participated in the point and interval judgment conditions on two different days, with at least two days in-between. The order of the conditions was randomized among the participants, half of the participants started with point judgments and the other half with interval judgments.

After having completed the box-lifting task during the second day, the participants filled out a questionnaire regarding reference weights. They were asked if they, during the weight estimation task, thought of any weight(s) they were familiar with from before the experiment, and if they used some specific strategy when they made their weight estimations. Finally, the participants were asked to judge the volume in cubic centimeters (cm3) of the three different box sizes.

Statistical analysis

The main tool for statistical analysis was R v.4.0.3 [20] with RStudio v.1.3.1093 [21] including the packages nlme v.3.1–149 [22], lme4. v.1.1–26 [23] and sjPlotv.2.8.7 [24], Jamovi 1.6.23.0 [25] was used as a secondary tool. The data files and scripts for the analyses can be found at https://figshare.com/s/92fc9e629e2a3b2a2343

Results

The result section is divided into three main parts, the size-weight illusion, accuracy of weight judgments, and stability of judgments. Within each part we will first examine the point judgments, then the self-selected interval judgments, and finally compare the two measures. The main size-weight illusion issue is, to what extent that boxes that weigh the same but are of different size are judged to weigh differently. The main accuracy of judgment issue regards how close judgments are to the true physical weight. The main stability of judgment issue regards how consistently a person judges the same box across different sessions.

It is important to note that complete accuracy means that there is no illusion, because in that case, boxes of the same weight would be judged to weigh the same independent of box size. However, a reduced size-weight illusion does not automatically mean better accuracy. For example, if the judgments of a small and large box were 500g and 600g the average would be 550g. The same average would be obtained by the judgments 400g and 700g. If the true weight was, for example, 300g, the first and second pair of judgments would deviate 250g on average from the correct weight, but the second pair of judgments (400g and 700g) would mean a greater size-weight illusion than the first pair (500g and 600g).

To summarize our data Fig 3 illustrates the aggregate response pattern. The aggregate results are presented here primarily as a descriptive tool to give a general sense of this experiment’s results as a whole; detailed analyses will follow section by section. The diagonal line is a reference line for correct responses. The distance between box judgments of the same weight illustrates the size-weight illusion (SWI). Note that the colored fields illustrate the average range of the interval judgments and the points the average point judgments. For each consecutive split the colored fields become darker, thus, when a point falls on the darkest area of the field it indicates that it falls within the limits of the fourth and final interval, which was 1/8 of the initially self-selected interval. Fig 3 also illustrates that the responses increase exponentially as a function of weight for all box sizes, and that the average judgments show a linear trend when they are log transformed. For this reason, the log transformed values were used in our linear models comparing accuracy.

Fig 3. Average judgment of each box’s weight.

Fig 3

Points illustrate average point estimates and the colored fields boundaries are determined by the average upper boundary judgments and lower boundary judgments of the self-selected intervals. The colored fields become darker with each consecutive split of the intervals. Hence, the lightest color indicates the average width of the first interval (widest) and the darkest color indicates the average width of the fourth and final interval (most narrow). The aggregate results are stable across sessions, shown in the appendix with plots similar to Fig 3, but for each separate session. Importantly, there was great individual variability in the judgments, shown with figures for each participant, in the appendix.

The size-weight illusion

Point estimate

First, we wanted to confirm that our experiment replicated the size-weight illusion (SWI). We calculated the average point judgment for each of the boxes. Averages were calculated across all participants and trials, and Fig 3 shows a clear general size-weight illusion. Hence, we have shown that the illusion holds for judgments of actual weight of an object. Judgments of actual weight is a somewhat different task to the task in most studies of SWI, which commonly use magnitude estimation, and sometimes reference weights [4]. The distinction is important because judging the true weight directly does not explicitly lead the participant to make direct comparisons of specific boxes’ weights. However, it should be noted that written subjective reports after the experiment showed that most participants thought of one, or several, reference weight(s) while making their judgments; for example, a carton of milk of 1 liter which weighs approximately 1kg.

To further check for the robustness of the size-weight illusion we counted the number of cases in which, within a session, a smaller box was judged as heavier than the box one size larger of the same true weight (i.e. for each weight, small was compared to medium and medium compared to large). We will refer to these as SWI-cases. The comparisons were made only within each session (in each condition there were 3 sessions with 15 trials). To clarify, the weights 235g, 455g, 649g and 1140g were present both as small and medium boxes, generating 4 comparisons of boxes of the same weight, but different size, per session. The weights 455g, 649g, 1140g and 2199g were present in both medium and large boxes, generating another 4 comparisons per session. This means that the theoretical maximum number of SWI-cases per participant was 24, for each type of judgment. The lightest weight (114g) was only present for the small box, and the heaviest weight (3808g) was only for the large box, thus these weights could not be compared across sizes.

The number of observed SWI-cases per participant ranged from 16 to 24 with a median of 23, and the mode was 23 as well (11 of 31 participants). Only 7 participants showed fewer than 21 SWI-cases (Q1). Note that 12 SWI-cases are expected by chance if a participant judges the weights at random. This applies also if a participant experiences no illusion, because of lack of consistency (identical judgments occurred only in 31/744 = 4.2% of cases). Summing all SWI-cases across participants showed that boxes were judged heavier than the one size larger counterpart 90% of the time (667 of 744 comparisons).

Self-selected intervals

The robustness of the SWI holds for self-selected intervals and each of the consecutive splits, which is illustrated by the group data in Fig 3. The figure illustrates that average intervals do not overlap between boxes of the same weight, which illustrates the strength of the SWI, on the aggregate level. Non-overlapping intervals were found also within individuals. If a participant perceived that two boxes could not possibly weigh the same, the 1st intervals (the widest) of those two boxes should not overlap. We calculated the number of cases the lower interval limit of a smaller box was judged larger than the upper interval limit of the one size larger box of the same weight. Interval limit comparisons were carried out within each participant and session, as for the SWI-case analysis with point judgments. The average number of cases that a participant found it unreasonable that the boxes could weigh the same was 12.29/24 (51%), the range was 2/24 to 23/24, the median was 11/24 (46%), and the mode (5 participants) was 19/24 (79%). Note that participants choose the limits of their intervals freely (50% would have been the prediction for random responses to the question “is it possible that the boxes weigh the same, yes or no?”).

Comparisons of point judgments and interval judgments

Even though the interval measures illustrate the strength of the SWI, we wanted to know if the influence of size on judgments of weight could be reduced with the self-selected interval method. To make interval judgments comparable to point judgments we used the midpoint of the intervals as a metric. Because the interval splitting procedure was successively converging, we will focus the following analyses on the first interval midpoint, called mid1, and the fourth and final interval midpoint, called mid4.

To calculate the number of SWI-cases for interval midpoints we used the same procedure as with the point judgments. That is, we calculated the number of times each participant’s estimate of a smaller box was greater than a box one size larger of the same true weight, descriptive statistics in Table 2.

Table 2. Descriptive statistics of SWI-cases per participant for the interval conditions first and final mid, as well as the point judgments.
  Min Q1 Median Q3 Max Mode Mean SD Sum %
Point 16 21 23 23 24 23(11) 21.52 2.41 667 89.65
Mid1 13 19.5 21 22 24 22(9) 20.32 2.97 630 84.68
Mid4 14 19 22 23 24 22(7) 20.71 2.84 642 86.29

Number of participants per mode in parentheses (out of n = 31). Percentage of total SWI-cases was calculated as the sum of SWI-cases divided by 744 (the total number of comparisons).

First, we compared the number of SWI-cases per participant between point judgments and interval mid-points. We found that of the 24 comparisons per participant, the average number SWI- cases was for mid1 20.3 (SD = 3.0), and for point judgments 21.5 (SD = 2.8). Hence, the number of SWI-cases per participant was on average 1.2 less for interval mid1, compared to point judgments, t = -2.7, df = 30, p = 0.012 (two-tailed), 95% CI [-2.1, -0.3].

To further analyze the effects of size on weight judgments depending on judgment type, we compared the strength of the SWI between point judgments and interval mid1 and mid4. The metric for SWI strength was calculated as the difference between weight judgments of a box and its one size larger counterpart which was divided by the lighter of the two. This measure of SWI will be referred to as the SWI-factor from here. To exemplify, if a small box was judged to weigh 500g and the medium box 400g, the SWI-factor is (500–400)/400 = 0.25 (25% heavier smaller box). If instead the medium box was judged the heavier, in the example above, we would get (400–500)/400 = -0.25 (25% lighter smaller box). There were 8 comparisons per session which gives a total of 24 comparisons per judgment method and participant. We found some large outliers, especially in the point condition where one SWI-factor was as large as 49. We excluded SWI-factors greater than 10 and smaller than -10. The number of outliers removed was 16 for the point measure, 12 for the mid1 and 13 for mid4, that is, approximately 2% of the 744 SWI-factors originally computed per condition were excluded. See Table 3 for descriptive statistics before and after removing the outliers.

Table 3. Descriptive statistics of weight judgments of a smaller box proportional deviation from the corresponding box that was one size larger.
Measure Min Q1 Median Q3 Max Mean SD
Point -14.00 0.33 0.86 1.69 49.00 1.69 3.70
Mid1 -36.00 0.21 0.67 1.60 25.67 1.24 2.95
Mid4 -39.91 0.23 0.71 1.58 24.50 1.25 3.06
Filtered, outliers -10 > SWI-factor > 10 removed
Measure Min Q1 Median Q3 Max Mean SD
Point -1.09 0.33 0.87 1.62 9.00 1.33 1.66
Mid1 -4.00 0.21 0.67 1.50 9.00 1.09 1.49
Mid4 -4.03 0.22 0.71 1.51 9.68 1.08 1.43

Positive values indicate how much heavier, proportionally, a smaller box in a same weighed pair of boxes is judged (i.e. 0.86 means 86% heavier). Negative numbers indicate the same but the larger box was the one judged as heavier (reverse-SWI).

Because, the weights were different for SWI-factors of small/medium comparisons (weights 235g – 1140g) and medium/large comparisons (weights 455g – 2199g), We analyzed the data separately for each of these factors. We also excluded outliers and used only the data described in the bottom half of Table 3. A repeated measures ANOVA with the factors 2 (measure: point or mid1) x 4 (weight) x 3 (trial session) showed a significant main effect of measure for both the small/medium comparisons F(1,30) = 5.636, p = 0.024, η2p = 0.158, and medium/large comparisons F(1,30) = 6.275, p = 0.018, η2p = 0.173. This indicates that SWI is lower with interval judgments compared to point judgments. However, the SWI-factor peaked for the point measure, while it remains at a similar level for lighter and heavier weights for the interval mid, illustrated by Fig 4. This peak was found at the second weight in rank order for both the small/medium comparisons (455g) and the medium/large comparisons (649g). The interaction effect between weight and measure was significant for the medium/large comparison F(3,90) = 4.08, p = 0.009, η2p = 0.120, but not quite for the small/medium comparison F(3,90) = 2.49, p = 0.065, η2p = 0.077. We had no a priori hypothesis about this specific peak so this finding should only be considered as exploratory. Importantly the main effect should be interpreted cautiously in the presence of a significant interaction [26].

Fig 4. Average SWI-factor depending on weight and judgment method.

Fig 4

Separate lines for SWI-factor for small/medium (solid) and medium/large (dotted) comparisons. Whiskers indicate 95%-CI’s.

Our interpretation is that the interval method potentially may reduce SWI, but only by limiting responses indicating large illusions rather than reducing the illusion in general. In other words, the point judgments peak that is absent in the interval condition may imply that the interval method puts an upper limit to the illusion. The pattern of a peak at the second lightest weight holds across all three sessions, with the only exception that in session 3 the lightest medium/large comparison (455g) led to the greatest SWI-factor, implying some reliability of this finding. Fig 5 illustrates the average SWI-factor across sessions. Mid4 was included to show that there is no clear reduction (or increase) in SWI-factor following the splitting procedure (narrowing the intervals), and we did not analyze the SWI-factor for mid4 further for this reason.

Fig 5. Average SWI-factor depending on weight and judgment method for each judgment session.

Fig 5

Separate lines for SWI-factor for small/medium (solid) and medium/large (dotted) comparisons.

To summarize, the size-weight illusion is very robust, and holds to the extent that judgment intervals often do not even overlap for boxes of the same weight but different size. The interval method seems to reduce illusions above a certain level, that is, the interval judgment method puts a limit on the measured illusion.

Accuracy of weight judgments

To quantify the accuracy of judgments we calculated the proportional absolute deviation, pad:

|jw1|=pad (1)

where j is a participant’s estimate (e.g. point judgment or one of the midpoints of an interval) and w is the true weight of the judged box. To exemplify, for a box of weight w = 200, judgments of j = 100 and j = 300, would both result in a pad = 0.5. The average pad was for point judgments 56.5%. For interval judgments, the average pad was for the first interval midpoint 56.0% and the fourth and final mid 55.6%. The average pad became marginally but monotonously smaller for each split, but the 0.4 percentage point reduction from first to final mid was not statistically significant, and neither were point judgments, in dependent two-tailed t-tests.

To estimate and compare the accuracy of point judgments to the accuracy of the final midpoint of the intervals (mid4), across the range weights and sizes of boxes, the logarithm of the measures was modeled by a mixed effect model using restricted maximum likelihood estimation. The logarithm was used because then the response pattern for each box becomes approximately linear. This is illustrated by Fig 3. We followed the modeling strategy suggested by Zuur et al. [27] to give less biased estimators of the variance components. Therefore, our final model uses restricted maximum likelihood estimation.

The results indicate that the final interval judgments (mid4) are marginally closer to the correct judgments compared to point judgments. The model results are illustrated in Fig 6 (numerical estimates can be found in S1 Table) which also shows that all fixed-effects were significant. The figure clearly shows the size-weight illusion in both conditions. The slopes become steeper with increasing box size indicating that the largest box was the hardest to judge accurately within the range of weights that we used in this experiment. Furthermore, the difference between the lines for point judgments and mid4 is greatest for the largest box, indicating that the interval method may improve accuracy more when the task becomes more difficult.

Fig 6. Slope estimates for log of point judgments and log of mid4 in grams.

Fig 6

Main effects: the log of the true box weight, the type of judgment estimate (point vs mid4), and size (small, medium, large). The model also contains first-order interaction effects, and random effects of, log of true weight, type of judgment estimate, and size. The numerical model estimates can be found in S1 Table.

Further exploratory analysis of the self-selected interval accuracy

To further investigate the accuracy of the intervals we calculated how many intervals that included the true value, we call this a “hit”. A hit was defined as the true weight being smaller than the upper limit and greater than the lower limit reported. Of the 1395 intervals (31 participants * 45 judgments) only 290 intervals hit the true value. On average participants hit the true value with the interval 9.35/45 times, a hit rate of 20.8%, SD = 19.3%. The least accurate participant had 0% hits and the most accurate had 75.6% hits. The hit percentage of a participant correlated strongly with participant average interval width, r = 0.82, 95% CI [0.65, 091], t = 7.63, df = 29, p < .001, and, interval width relative the true weight of a box, r = 0.76, 95% CI [0.56, 0.88], t = 6.37, df = 29, p < .001. Furthermore, hit rates were on average higher for small boxes (28.4%) compared to medium boxes (20.6%) and large boxes (13.3%).

It should be noted that the hit-rate for the intervals was approximately halved for each consecutive split of the interval, 1st interval 20.8%, 2nd interval 10.1%, 3rd interval 4.4%, 4th interval 2.3%. For each split, half the hit rate for half the interval width is expected by chance. In other words, when participants managed to select intervals that included the true value, they did not manage to keep the true value within their selected interval boundaries any more than if the split intervals were selected at random.

We also calculated how often participants captured their own point judgment within the interval judgment of that same box in the corresponding session (judged upper limit ≥ point judgment ≥ judged lower limit). That is, a point within the limits of the corresponding interval judgment was coded as 1 (captured by the interval) and outside the limit as 0 (not captured by the interval). As mentioned, the sequence of boxes in a session (1 to 3) was the same in the point and interval conditions. Participants’ average proportion of point judgment captured by the corresponding intervals was 42.5% (SD = 19.2%). This is an important finding because it suggests that more often than not participants did not find their point judgments as reasonable weights according to the interval limits. The order of the tasks did not seem to affect how often point judgments were captured by intervals, 41.3% (SD = 18.9%) for participants doing the interval judgments the first day and 43.6% (SD = 20.1%) for point judgments the first day. This indicates that there were no calibrating effects of one type of judgment over the other.

To find out to what extent the interval width could explain the case when intervals covered the points; we first regressed points covered on interval width, across all participants’ judgments, and found that only 1% of the variance in points covered was accounted for by the interval width.

To account for intervals being wider for greater weight judgments we regressed points covered on interval width relative to its midpoint (interval width/mid1, the relative interval width). The relative interval width accounted for 14% of the points covered. This indicates that intervals are not well calibrated towards the points judgments and may employ a different process.

We also wanted to know to what extent the average interval width could explain the proportion of points covered. Therefore, we also calculated the average interval width for each participant, M = 347g, SD = 337g, and average relative interval width, M = 0.49, SD = 0.30. Then, we regressed the proportion of points covered on the average interval width, R2 = 0.22, b = 2.65, F(1, 29) = 7.941, p = .009 and average relative interval width, R2 = 0.64, b = 0.52, F(1, 29) = 50.73, p < .001. This may indicate that some people adapted the range of their intervals to the magnitude of their point judgment and were more calibrated towards their point judgments. This is corroborated by the individual plots found in the appendix which shows that some participants increased the width of their intervals as boxes got heavier and judgments larger, while other participants used narrow intervals across the range of weights. Furthermore, the intervals and points may differ in regard to what judgment is reasonable to derive from our perceptions.

Variation of judgments depending on condition

We wanted to know if individuals’ weight judgments would vary less when they use intervals rather than points (independently of if the judgments were accurate or not). In other words, we wanted to find out if the judgments would become more consistent with the self-selected interval method. To be able to compare box judgment variation across measures, while taking into account individual variation of judged box weights across boxes, we calculated the coefficient of variation for each participant. We calculated the standard deviation of each individual’s estimate (point judgment or interval mid) of each of the boxes and divided that standard deviation by the same individual’s average estimate of that box. Hence, the relative standard deviation of weight estimates was calculated from each participant’s judgments of each of the boxes in each of the conditions (point and interval). As an example, if a participant during the three sessions judged a box as 350g, 500g, and 650g (M = 500, SD = 150,) the relative standard deviation for that participant and box would be 150/500 = 0.3. This is compared to the relative standard deviation of the interval midpoint of that same box, for example, 400g, 500g and 600g (M = 500, SD = 100) which would give 100/500 = 0.2 (a reduction of 0.1 in proportional variation).

The average relative standard deviation of point judgments (M = .321) was higher compared to both mid1 (M = .287, t30 = 1.89, p = 0.068), and mid4 (M = .280, t30 = 2.25, p = 0.032), however, the difference was not significant (two-tailed t-tests) compared to mid 1. Averages are illustrated in Fig 7.

Fig 7. Relative standard deviation of box judgments depending on measure and size of the box.

Fig 7

Whiskers indicate 95%-CI.

It should be noted that the average relative standard deviation (SD/M) was greater for the lower limit judgments (M = 0.32) than the upper limit judgments (M = 0.28). Furthermore, in absolute terms the upper limit judgments varied more for all sizes and weights. This indicates that variation was larger because the numbers were larger. This supports the use of a relative measure of variance for comparisons of variation across methods.

To summarize, the self-selected interval method slightly reduced the variation of weight estimates of the same object. The reduction of average relative SD from point (.32) to mid4 (.28) suggests a potential for the method. It needs to be developed further, specifically regarding the extent to which the process of successively converging intervals can be used to stabilize people’s estimates of real-world parameters in an effective way.

Discussion

In this study, we have investigated an interval judgment method called self-selected intervals [2, 3] and compared it to traditional point assessments, using a weight judgment experiment. The weights were boxes of different sizes, and the size-weight illusion was present during all trial sessions. Two potential strengths of the self-selected interval method were found. First, the self-selected intervals reduced some of the SWI (size-weight illusion), primarily when the point judgment SWI was at its maximum. Second, the midpoints following the successively converging intervals (splitting method) gave somewhat more stable weight estimates compare to point judgments; participants showed less intra-individual variation in their weight estimates derived from the judged interval (mid1), and following splitting procedure (mid4), than in their point judgments. However, the advantages of the interval method were small and we found only little support for the interval method as a way to get more accurate weight estimates. The differences between the estimates of the two methods were notably greater for both the largest cases of SWI and the largest box, which was the most difficult to judge accurately. This may indicate that the interval method becomes particularly useful when the task is difficult.

The interval method gave us a novel way to investigate the strength of the SWI. This is because interval judgments indicate the bounds on what a person would say is reasonable. Hence, no overlap between intervals indicates with greater certainty than point estimates that a participant differentiates between weights, even if the true weight is the same. For this reason, the interval method may be used as a new way to measure the strength of SWI. Furthermore, the interval methods reduction, compared to point estimates, of the largest SWI in the data set, may indicate that the measure can put a limit on the magnitude of the illusion. However, more research is needed to establish if the interval method actually reduced the illusion perceived, or if it is inherent in the method to limit extreme judgments in general, independent of what is judged.

Previous research has suggested that the illusion is impenetrable by cognition. This was shown by adding cognitive load to the task, and the illusion remained the same [28]. Assuming that interval judgments demand more cognitive effort than traditional point judgments, our results indicate that there may be a threshold for which illusions greater than that threshold may be limited by cognition. Furthermore, we do not yet know if it is the cognitive effort added to the task by using interval judgments or if judgments in general become less extreme with the method, as mentioned above. There is an important methodological difference highlighted here, because the study by Saccone et al. (2019) added cognitive load as a distraction while we added cognitive effort to the actual judgment process of the main task. If weight judgments are generally made based on sensory input, and little cognition is present to begin with, there is not much cognition to distract by adding cognitive load. On the other hand, if the experimental method elicits additional cognitive effort, this may start to moderate the resulting judgments or estimates given by the participants.

Earlier studies of the self-selected interval method have used measures of subjective preference, for example willingness to pay [3, 29]. In this study, judgments were regarded as objective true values, and judgments were most likely primarily based on perception, that is, information derived from sensory systems such as sight and haptics. Because we asked for the true weight of the boxes, a cognitive process was assumed to follow (and this assumption is strengthened by the participants’ retrospective reports which indicated that they thought about other objects’ weight, such as a milk carton of 1kg, while trying to estimate the true weight of a box). All participants stated that they used a weight they are familiar with in real life as a reference for their judgments, with 1kg being the most common weight used as a reference, and a 1-liter milk carton being the most common object used as a reference object for that weight. The second most common objects were exercise weights. Importantly, some participants referred to only one weight, while others referred to several weights, or a range of weights. Reported reference weights ranged from 10g to 10kg. Whether or not the interval method is helpful or not for a person may depend on the range of references used, as this includes a wider base of prior knowledge used to determine one’s judgment to respond with. For the present study we asked open ended responses for exploratory purposes. Because these responses were similar in nature, and generally clear, we encourage future studies to make the questions more specific so that these variables can be parameterized in an appropriate way to determine if some people may or may not benefit from judging with the self-selected intervals. Furthermore, the density of a participant’s reference weights may have affected their weight judgments. S7 Fig shows that the density derived from the judgments deviates more from the true density the larger the box is. If the size affects the sensitivity to density, the reference weight may have different effects depending on its size compared to the judged box. To exemplify, a 1kg exercise weight disc may be perceived as more similar to the heavy small box while a 1kg milk-carton may be more similar to the medium box. Experimentally trying to manipulate what reference weights are used may give interesting information about accuracy of judgments and comparisons between the methods.

Another important factor to consider for both weight judgment and interval judgments is the amount of experience with lifting objects with known weights. A person with a lot of experience may use appropriate reference weights for the object lifted, and therefore be better at judging accurately. A person with a lot of experience may also be familiar with how difficult a task actually is, and therefore be able to calibrate their interval limits to properly reflect the uncertainty of their judgments. Hence, the experienced person may be able to narrow the judged interval down to a more accurate guess. Examples of people with experience in lifting weights, within the range used in our experiment, are people exercising with weights and people working in a post-office. It may be that the strong cognitive component in trying to estimate the true weight makes the average judgment accuracy similar between the two types of judgments (point and interval). The size-weight illusion, on the other hand, is primarily sensory driven, and the interval judgments did not reach the same level of illusion as the point judgments did.

Furthermore, the interval-method was found to increase the accuracy of judgment slightly, and this may be due to the judgments being based primarily on sensory information. For this reason, future research of the self-selected intervals can benefit from comparing cognitive judgments about the objective reality with perceptual experiences, as well as to preferences (such as WTP which can include a strong emotional component). The usefulness of the interval method may depend on the primary source of information, sensory, cognitive/theoretical or preferences based on cognition, emotion, or a blend of the two. Assuming that the interval-method can “zoom in” on a person’s best, or true, response to a task, this process can be dependent on which mental systems are activated during that task.

Limitations and future directions

While making interval judgments, participants generally held the box for a longer period of time than while making point judgments, because of the more elaborate way of responding. Assuming that perceived heaviness increases with time due to increased muscle strain, this may have led participants to respond with somewhat heavier estimates in the interval condition. For further investigation of interval judgments of weight, the time a box is held should be standardized across conditions. Another aspect of this problem is that participants responded more times per box in the interval judgment condition, meaning that they may have adjusted simply because they responded several times. Hence, for future studies, to control for number of responses per item judged, there should also be a condition where participants respond several times for each box to account for the number of judgments made in the interval condition. Ideally, this can be compared to a condition with a single point estimate per box, but with the same amount of time holding the box (suggested above). This way the factor ‘time holding an item’ can be disentangled from the factor ‘number of judgments’ of that same item. This, in turn, would allow for an even more accurate comparison with the self-selected interval method. One problem with the repeated point judgment approach is that participants may perceive it as tedious, and/or unnecessary, to answer the exact same question several times consecutively. Therefore, the instructions should be very carefully worded, to avoid those problems. A suggestion may be to instruct participants to really think about the weight, and if their judgment of that weight is reasonable. This can also be used as a way to increase cognitive effort allocated to the task to see if there is a point where cognition can start to moderate the illusion (i.e. finding a threshold from which the illusion can be limited).

To gain more insight into the extent to which self-selected intervals are a useful tool in estimating and judging real world parameters, depending on if the task is perceptual, cognitive, or both, it would be useful to further investigate self-selected intervals with splitting in other contexts where the task is primarily cognitive, such as traditional judgment problems, and compare it to experiments primarily perceptive, such as magnitude estimation. We found that the successively converging intervals (splitting method), reduced the intra-individual variation of weight estimates to some extent, compared to point judgments. However, the estimates were stabilized within individuals only to a limited extent and a limiting factor may have been that participants understand “reasonable” in different ways (they were told to judge the greatest and smallest reasonable weight). Furthermore, we know from previous studies [7, 8] that people are generally too confident in interval accuracy. Many participants used very narrow intervals (see S1 Fig) which indicates that they had a very clear idea about what “reasonable” means and/or were overconfident in their ability to judge weights. Hence, working to standardize the procedure so that participants use the limits more similarly and ask for confidence judgments may help develop this particular method. We did not include confidence judgments because confidence was not a focal point per se. However, our data suggests that it may be useful in future developments.

Interestingly, even though the aggregate results (Fig 3) places point estimates close to the center of the intervals, when looking at each individual’s judgments, point judgments fell outside the corresponding interval limits for the most part. This means that findings from studies where best point and interval judgments were made one after another within the same task session, e.g. [30], may not generalize to situations where only one way of assessment is used. Furthermore, having people specify where in the interval they think the best guess is with the splitting method (which is similar to defining a point in the interval), may not necessarily lead towards an estimate that conceptually corresponds to a point judgment. The point estimation process may be different than the cognitive process used for intervals.

It seems reasonable to assume that the natural reason for using intervals in real life is due to a specific estimation that is deemed as difficult to make. However, the objective difficulty of the task (accuracy) and the subjective difficulty of the task (confidence) are not the same, but these factors can provide insights into how the self-selected interval method compares to point judgments. These factors may also be relevant for individual differences when intervals are used.

A way to gain a better understanding of how different people may use, and/or think in terms of intervals in real life, is to use cognitive process tracing methods. Such methods may find out what processes are involved in making point and interval judgments each one at a time or in conjunction with each other. This would probably be a very informative way to further investigate the usefulness of self-selected intervals.

To summarize, the self-selected interval method shows promise when studying a person’s subjective estimates and numerical judgments, whether it is cognitive and preference driven as in previous WTP studies or perceptual and belief driven as in this study. However, there is still work to be done regarding standardization of the procedure and finding out how the interval interacts with judgments and estimates depending on the task’s properties in the dimensions subjective/objective and cognitive/perceptual.

Supporting information

S1 Fig. Illustration (1 of 3) of each participant’s average judgments for each of the boxes and judgment methods.

The figure is conceptually the same as Fig 2, which introduces the result section. Points indicate point judgments, intervals indicate the range of the upper and lower interval limits and the color darkens for each consecutive split (i.e. the darkest middle field indicates the upper and lower limits after the final interval split). True weights are described on the x-axis and judgments on the y-axis, thus, the diagonal line indicates true judgments. To these individual plots we have, in the title, added the number of size-weight illusion cases for Point (P), Mid1 (M1), Mid4 (M4) and no interval overlap (IO).

(TIFF)

S2 Fig. Illustration (2 of 3) of each participant’s average judgments for each of the boxes and judgment methods.

(TIFF)

S3 Fig. Illustration (3 of 3) of each participant’s average judgments for each of the boxes and judgment methods.

(TIFF)

S4 Fig. Average judgments for each of the trial sessions.

(TIFF)

S5 Fig. Average judgments relative the true weight of the box for each of the sessions.

(TIFF)

S6 Fig. Average log judgments and log true weights.

(TIFF)

S7 Fig. Density derived from judgments relative to the true density of the boxes.

To illustrate how the judgments relate to density depending on box size we first computed the density derived from judgments by dividing weight judgments by the true volume of the judged box. We then calculated the derived judged density relative to the true density. This was done to be able to visualize the full range of judgments in a single plot. Furthermore, with this calculation, a value of 1.0 means that the derived judged density is the same as the true density. The left panel illustrates average judgments for point judgments (solid dots) and mid4 (empty square), along with the average judged upper and lower interval limits (vertical lines showing the range of the interval). The right panel illustrates linear regression lines fitted to the same data used for the left pane. The solid lines are fitted to point judgments and the dotted lines are fitted to the mid4 judgments. The figure shows that the expected density for the different boxes are not the same because the regression lines intersect 1.0 on the y-axis at different points for each of the box sizes, large (blue), medium (red) and small (green). The average expected density for each box size is indicated by the value on the x-axis where the regression line goes through the horizontal line at 1.0 on the y-axis.

(TIFF)

S1 Table. Model estimates: Mixed effect model illustrated in Fig 3.

True weight of boxes and judgment estimates were log transformed for this analysis. Main effects of true box weight, the type of judgment estimate (point vs mid4), and size (small, medium, large). First-order interaction effects between true weights, type of judgment estimate, and size. Random effects for true weight, type of judgement estimate, and size.

(DOCX)

Acknowledgments

We would like to thank Angel Angelov for valuable input on the statistical procedures in this manuscript.

Data Availability

All relevant data are available on Figshare (DOI: 10.17045/sthlmuni.19213245).

Funding Statement

Marianne \& Marcus Wallenberg Foundation, Project MMW 2017.0075 \emph{The right question ? new ways to elicit quantitative information in surveys}. Recipient: BK (main applicant, ME och MN subapplicants) web: https://mmw.wallenberg.org/startsida No commercial companies supported the study “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

References

  • 1.Henrion M, Fischhoff B. Assessing uncertainty in physical constants. Am J Phys. 1986. Sep;54(9):791–8. Available from: http://aapt.scitation.org/doi/10.1119/1.14447 [Google Scholar]
  • 2.Angelov AG, Ekström M. Maximum likelihood estimation for survey data with informative interval censoring. AStA Adv Stat Anal. 2019. Jun;103(2):217–36. [Google Scholar]
  • 3.Belyaev Y, Kriström B. Two-Step Approach to Self-Selected Interval Data in Elicitation Surveys. SSRN Electron J. 2012. [Google Scholar]
  • 4.Saccone EJ, Landry O, Chouinard PA. A meta-analysis of the size-weight and material-weight illusions. Psychon Bull Rev. 2019. Aug 1;26(4):1195–212. doi: 10.3758/s13423-019-01604-x [DOI] [PubMed] [Google Scholar]
  • 5.Chirimuuta M. Why the “stimulus-error” did not go away. Stud Hist Philos Sci Part A. 2016. Apr;56:33–42. [DOI] [PubMed] [Google Scholar]
  • 6.Angelov AG, Ekström M. Nonparametric estimation for self-selected interval data collected through a two-stage approach. Metrika. 2017. May;80(4):377–99. Available from: http://link.springer.com/10.1007/s00184-017-0610-7 [Google Scholar]
  • 7.Teigen KH, Jørgensen M. When 90% confidence intervals are 50% certain: on the credibility of credible intervals. Appl Cogn Psychol. 2005. May;19(4):455–75. Available from: http://doi.wiley.com/10.1002/acp.1085 [Google Scholar]
  • 8.Teigen KH, Løhre E, Hohle SM. The boundary effect: Perceived post hoc accuracy of prediction intervals. Judgm Decis Mak. 2018;13(4):309–21. [Google Scholar]
  • 9.Nicolas S, Ross HE, Murray DJ. Charpentier’s Papers of 1886 and 1891 on Weight Perception and the Size-Weight Illusion. Percept Mot Skills. 2012. Aug;115(1):120–41. Available from: http://journals.sagepub.com/doi/10.2466/24.22.27.PMS.115.4.120-141 [DOI] [PubMed] [Google Scholar]
  • 10.Amazeen EL. The Effects of volume on Perceived Heaviness by Dynamic Touch: With and Without Vision. Ecol Psychol. 1997. Dec;9(4):245–63. Available from: http://www.tandfonline.com/doi/abs/10.1207/s15326969eco0904_1 [Google Scholar]
  • 11.Plaisier MA, Smeets JBJ. Mass Is All That Matters in the Size–Weight Illusion. Paul F, editor. PLoS ONE. 2012. Aug;7(8):e42518. doi: 10.1371/journal.pone.0042518 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Amazeen EL, Tseng PH, Valdez AB, Vera D. Perceived Heaviness Is Influenced by the Style of Lifting. Ecol Psychol. 2011. Jan;23(1):1–18. Available from: http://www.tandfonline.com/doi/abs/10.1080/10407413.2011.539100 [Google Scholar]
  • 13.Peters MAK, Balzer J, Shams L. Smaller = Denser, and the Brain Knows It: Natural Statistics of Object Density Shape Weight Expectations. Malo J, editor. PLOS ONE. 2015. Mar;10(3):e0119794. doi: 10.1371/journal.pone.0119794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Peters MAK, Ma WJ, Shams L. The Size-Weight Illusion is not anti-Bayesian after all: a unifying Bayesian account. PeerJ. 2016. Jun;4:e2124. doi: 10.7717/peerj.2124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Buckingham G, Byrne CM, Paciocco J, Eimeren L van, Goodale MA. Weightlifting exercise and the size-weight illusion. Atten Percept Psychophys. 2014. Dec;76(2):452–9. doi: 10.3758/s13414-013-0597-8 [DOI] [PubMed] [Google Scholar]
  • 16.Vicovaro M, Burigana L. Properties of the size-weight illusion as shown by lines of subjective equality. Acta Psychol (Amst). 2014. Jun;149:52–9. doi: 10.1016/j.actpsy.2014.03.001 [DOI] [PubMed] [Google Scholar]
  • 17.Flanagan JR, Beltzner MA. Independence of perceptual and sensorimotor predictions in the size–weight illusion. Nat Neurosci. 2000. Jul;3(7):737–41. doi: 10.1038/76701 [DOI] [PubMed] [Google Scholar]
  • 18.Grandy MS, Westwood DA. Opposite Perceptual and Sensorimotor Responses to a Size-Weight Illusion. J Neurophysiol. 2006. Jun;95(6):3887–92. doi: 10.1152/jn.00851.2005 [DOI] [PubMed] [Google Scholar]
  • 19.Dunn TR, Harshman RA. A multidimensional scaling model for the size-weight illusion. Psychometrika. 1982. Mar;47(1):25–45. [Google Scholar]
  • 20.R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. [Google Scholar]
  • 21.RStudio Team (2020). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA: URL http://www.rstudio.com/. [Google Scholar]
  • 22.Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2021). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1–149, https://CRAN.R-project.org/package=nlme. [Google Scholar]
  • 23.Bates D, Mächler M, Bolker B, Walker S (2015). “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software, 67(1), 1–48. doi: 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  • 24.Lüdecke D (2021). sjPlot: Data Visualization for Statistics in Social Science. R package version 2.8.7, https://CRAN.R-project.org/package=sjPlot. [Google Scholar]
  • 25.The jamovi project (2021). jamovi (Version 1.6) [Computer Software]. Retrieved from https://www.jamovi.org.
  • 26.Montgomery D.C. Design and Analysis of Experiments. 8th ed. John Wiley & Sons, Hoboken , NJ; 2012. [Google Scholar]
  • 27.Zuur A., Ieno E. N., Walker N., Saveliev A. A., & Smith GM. Mixed Effects Models and Extensions in Ecology with R [Internet]. Springer Science & Business Media. 2009. [cited 2021 Mar 30]. Available from: https://link.springer.com/book/ doi: 10.1007/978-0-387-87458-6 [DOI] [Google Scholar]
  • 28.Freeman CG, Saccone EJ, Chouinard PA. Low-level sensory processes play a more crucial role than high-level cognitive ones in the size-weight illusion. Buckingham G, editor. PLOS ONE . 2019. Sep;14(9):e0222564. doi: 10.1371/journal.pone.0222564 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Belyaev YuK, Kriström B. Analysis of contingent valuation data with self-selected rounded wtp-intervals collected by two-steps sampling plans. In: Multivariate Statistics: Theory and Applications [Internet]. WORLD SCIENTIFIC; 2012. [cited 2021 Oct 25]. p. 48–60. Available from: https://www.worldscientific.com/doi/abs/10.1142/9789814449403_0004 [Google Scholar]
  • 30.Mandel D, Collins R, Risko E, Fugelsang JA. Effect of Confidence Interval Construction on Judgment Accuracy. 2020. [Google Scholar]

Decision Letter 0

Piers D L Howe

22 Jul 2021

PONE-D-21-18240

Self-selected interval judgments compared to point judgments: A weight judgment experiment in the presence of the size-weight illusion.

PLOS ONE

Dear Dr. Gonzalez,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

While you will need to address all the concerns of the reviewers, the following concerns seem to be particularly important:

1) As the first reviewer points out, Buckingham et al. (2014) also had participants make point estimates of the weight of objects, so should be cited.

2) Please discuss to what extent density influences weight judgments.

3) As pointed out by the third reviewer, the literature review is incomplete. Please cite and discuss more work that examines the size-weight illusion. To address the concern of the second reviewer, this review should show that the issue you address is not a straw man and that numerical point estimates are common.

4) Please describe your participants in more detail – they may be unusually good at this task.

5) I agree with the third reviewer that there is a potential confound between the conditions – participants provided more estimates in one condition than in the other. This issue needs to be discussed.

6) Please indicate which approach should be used in which circumstances.

Please submit your revised manuscript by Sep 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Piers D. L. Howe

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please ensure that you include a title page within your main document. We do appreciate that you have a title page document uploaded as a separate file, however, as per our author guidelines (http://journals.plos.org/plosone/s/submission-guidelines#loc-title-page) we do require this to be part of the manuscript file itself and not uploaded separately.

Could you therefore please include the title page into the beginning of your manuscript file itself, listing all authors and affiliations.

3. You indicated that ethical approval was not necessary for your study. Could you please provide further details on why your study is exempt from the need for approval and confirmation from your institutional review board or research ethics committee (e.g., in the form of a letter or email correspondence) that ethics review was not necessary for this study? Please include a copy of the correspondence as an ""Other"" file.

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

5. Please ensure that you refer to Figure 6 in your text as, if accepted, production will need this reference to link the reader to the figure.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments:

The study is not flashy but I appreciate that it reports a scientifically sound experiment and that these simple experiments that explore methodological issues are of value to the scientific community. I think the paper should be publishable but that some changes are required to increase clarity (see below re issues with the explanations of analyses and also figures) and also substance of the paper.

Specific issues:

Page 3, line 96. I believe this paper has assessed the SWI in terms of estimates of weight: Buckingham, G., Byrne, C. M., Paciocco, J., van Eimeren, L., & Goodale, M. A. (2014). Weightlifting exercise and the size–weight illusion. Attention, Perception, & Psychophysics, 76(2), 452-459.

I don’t think this invalidates the current study but it does reduce the novelty. Personally I think the study still has value but the authors need to adjust their statements throughout the manuscript with respect to this issue.

The meta-analysis that the authors cite (along with some other papers) strongly suggest that density influences weight judgments including the SWI. The authors report stimulus density but then do not report whether the manipulations impact the effect of density of weight judgments (overall accuracy as well as the SWI). I think this would be a useful addition to the paper and will make it more substantial and informative.

Minor issues:

Page 1, lines 34-40. “Another benefit in using this approach… in respect to both accuracy and illusion.” I do not understand what the authors are saying here. I think they need to unpack it more.

Page 2 lines 68-80. The relevance of this paragraph for the current study should be made clearer.

I think the authors need to explain why they included the average log estimates (Fig 2 right panel).

I find Figure 3 to be very unclear and think it needs to be revised. The lines and colours are difficult to see. Maybe use different colours for small/medium vs medium/large or else plot them separately. Also, I think it will be easier to understand if the x axis has the actual weights instead of weight rank (because weight rank is not at all intuitive). Likewise I think similar changes should be made to Figure 4 for the sake of readability.

Page 9 – I do not understand why the authors describe ANOVAs for the SWI-factor but then do not report them. This needs to be reported or at least explained more clearly.

Page 10, line 395 – what test was done to determine statistical nonsignficance? It should be reported.

Page 10, final paragraph – now I see that the log weights are analyses but again I do not know why. The authors need to explain this, as well as the maximum likelihood estimation analysis. Also, they refer to Figure 3 here – is this correct?

Page 11, line 424-5 – hit rates were higher for smaller boxes. Is this because of density?

Page 13, line 530-1 – the fact that the illusion is “cognitively impenetrable” is not new, I think at least one citation here for this previous finding is necessary.

Typos:

Page 2 line 51: should be “(WTP)”

Page 7 line 241: should be “experiment’s”

Page 7 line 242: should be “whole;” or “whole.” instead of “whole,”

Page 8 line 315: should be “participant’s”

Page 11 line 450: should be “overall”

Page 13 line 509: should be “were” instead of “was”

Reviewer #2: This is a well-written paper interval-based judgements of weight with point judgements of weight. But, while there is much to like about the paper, I feel that is providing a solution to a problem that does not exist. No one in the literature uses fixed point judgements for weight in the way that the authors compare their new method to, so any comparison is simply a straw man. I’m sorry I cannot be more positive, but the study is simply too shallow to warrant publication in this outlet in my view.

The financial disclosures statement isn’t really a statement

No ethical review and approval is rather surprising for this kind of study – presumably the journal has a policy on this?

Line 127 - Order of tasks – where did this factor come from? Surely the task itself is the factor here (and within subject?)

Line 133 – how were the boxes weighted? Was the centre of mass in the physical centre?

Line 133- More information needed on the procedure of lifting each box – it’s quite vague

Point judgement task is quite an odd one – without anything to calibrate to, reporting weight in KG is not something people can do naturally because it’s not something they ever have to do. They can make relative judgements well, but this task sits awkwardly between a scaled judgement and an absolute magnitude estimation. The authors do acknowledge this in the results section, and discuss participant strategies for completing this task.

Line 193 – I think it was three lifts per box, but this is a bit confusingly presented

Line 528 – the line beginning “Several participants showed signs…” isn’t quite worded correctly. But I’m not really sure that those sentences about the well-established cognitive impenetrability of the size-weight illusion are trying to accomplish

This article seems of relevance: https://www.sciencedirect.com/science/article/abs/pii/S0001691814000663

Reviewer #3: The authors measured and compared weight estimates of lifting and hefting objects using two different approaches: “point” and “self-selected interval” judgements – the former being the more prevalent approach. Task objects reflected the size-weight illusion, where objects of the same weight differing in size were presented (smaller ones typically feel much heavier). The authors considered whether the magnitude and variability of the illusion differed depending on which approach was used. I think the procedures were carried our rigorously and hence I won’t comment much about the details. I do have a few larger issues. The paper has merit for publication and it would be great if they can be addressed.

1) I am unable to tell you how much an object weighs in grams. We tried this a bit in my lab and it seems that the ‘psychology’ participants we recruit aren’t good either but, in all fairness, we never examined this systematically. Nonetheless, I am a little surprised that your participants can do this and I would like to know more why and how. What kind of students did you recruit? How did you train them to do this? There was some mention of having lifted a reference mug and how a participant said they were comparing the task objects to a litre of milk which weighs 1 kg (I did not ever think about this until now nor would most people know this). More information is needed. It would also be good to cite more work that examined how well people can judge weights in grams.

2) The two judgement procedures differ in more than one way. Hence, it is difficult to know exactly how the two can lead to differential effects. One concern I have is that they did not match in the number of iterations provided. Namely, for a particular ‘trial’, participants provided one judgment for the “point” approach and multiple ones for the “self-selected interval”. Can resulting differences not because of the strategic process entailed but rather the number of times people could provide an estimate? Would it have been better to allow participants to provide the same number of estimates in the “point” approach?

3) I felt the paper ended anti-climatically. The aim was to compare two methodological approaches and it would be a fair expectation for the reader to receive recommendations as to which of the two approaches should be used and under what contexts.

4) The discussion could also expand on what the results could tell us about the size-weight illusion.

5) I did not understand what you were trying to say in lines 34 to 40. Please unpack.

6) Fix “size-size weight illusion” in line 502.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Mar 16;17(3):e0264830. doi: 10.1371/journal.pone.0264830.r002

Author response to Decision Letter 0


12 Nov 2021

The following responses are included in the rebuttal letter included in this submission.

Dear Dr. Howe

We have considered the issues and problems raised in the reviews. Please, find below our responses. Our responses are marked with E (editor), R1, R2, and R3 (reviewer 1, 2, and 3).

While you will need to address all the concerns of the reviewers, the following concerns seem to be particularly important:

1) As the first reviewer points out, Buckingham et al. (2014) also had participants make point estimates of the weight of objects, so should be cited.

E.1

We thank the reviewer for this reference. We added the reference and some text linking our study to that study.

2) Please discuss to what extent density influences weight judgments.

E.2

We acknowledge that density has been considered to a great extent in the size-weight illusion, SWI literature. We have now added figures describing the relationship between density and accuracy judgments as supplementary material, because density was not the main focus of this paper.

3) As pointed out by the third reviewer, the literature review is incomplete. Please cite and discuss more work that examines the size-weight illusion. To address the concern of the second reviewer, this review should show that the issue you address is not a straw man and that numerical point estimates are common.

E.3

We added to the references, 15. Buckingham G, Byrne CM, Paciocco J, Eimeren L van, Goodale MA. Weightlifting exercise and the size-weight illusion. Atten Percept Psychophys. 2014 Dec;76(2):452–9., showing that objective weights have been asked for in previous SWI judgements. We also expanded the introductory section about SWI to include some of the different ways used to investigate the illusion.

4) Please describe your participants in more detail – they may be unusually good at this task.

E.4

We had, and still have, no specific reason to believe that our participants were better or worse than the average population within the same age range. Therefore, we did not record demographic data other than age and gender. Furthermore, our main purpose was to compare methods within individuals and we believe that these comparisons should be informative independent of variations in participants performance.

5) I agree with the third reviewer that there is a potential confound between the conditions – participants provided more estimates in one condition than in the other. This issue needs to be discussed.

We agree, and have expanded the text in the discussion to cover this possible confound and have suggested ways to deal with this problem in future experiments.

6) Please indicate which approach should be used in which circumstances.

E.6

We have expanded the discussion section to include factors that may affect when the interval method is preferred over the over point estimates, for example, that the method may become more useful for very difficult tasks, or with expertise.

Reviewer #1:

General comments:

The study is not flashy but I appreciate that it reports a scientifically sound experiment and that these simple experiments that explore methodological issues are of value to the scientific community. I think the paper should be publishable but that some changes are required to increase clarity (see below re issues with the explanations of analyses and also figures) and also substance of the paper.

Specific issues:

1) Page 3, line 96. I believe this paper has assessed the SWI in terms of estimates of weight: Buckingham, G., Byrne, C. M., Paciocco, J., van Eimeren, L., & Goodale, M. A. (2014). Weightlifting exercise and the size–weight illusion. Attention, Perception, & Psychophysics, 76(2), 452-459.

I don’t think this invalidates the current study but it does reduce the novelty. Personally I think the study still has value but the authors need to adjust their statements throughout the manuscript with respect to this issue.

R1.S1

Thank you for pointing out that we missed this study. We have cited the study and clarified that we only replicate the findings of SWI for judgments of true weight.

2) The meta-analysis that the authors cite (along with some other papers) strongly suggest that density influences weight judgments including the SWI. The authors report stimulus density but then do not report whether the manipulations impact the effect of density of weight judgments (overall accuracy as well as the SWI). I think this would be a useful addition to the paper and will make it more substantial and informative.

R1.S2

As mentioned in the earlier response, we have added plots with density and some comments.

Minor issues:

1) Page 1, lines 34-40. “Another benefit in using this approach… in respect to both accuracy and illusion.” I do not understand what the authors are saying here. I think they need to unpack it more.

R1.M1

We have rewritten this paragraph (now starting at line 52) to make it clearer.

2) Page 2 lines 68-80. The relevance of this paragraph for the current study should be made clearer.

R1.M2

The paragraph (now starting at line 52) has been rewritten to clarify that we described a previous approach studying judgments of intervals and that the main purpose of self-selected intervals should not be affected by participants’ confidence in their judgments, which was found for judgments of intervals.

3) I think the authors need to explain why they included the average log estimates (Fig 2 right panel).

R1.M3

We have clarified that the responses take a linear form when log transformed in conjunction with the figure and when explaining the multi-level model.

4) I find Figure 3 to be very unclear and think it needs to be revised. The lines and colours are difficult to see. Maybe use different colours for small/medium vs medium/large or else plot them separately. Also, I think it will be easier to understand if the x axis has the actual weights instead of weight rank (because weight rank is not at all intuitive). Likewise I think similar changes should be made to Figure 4 for the sake of readability.

R1.M4

We agree that reporting the ranks obscures the meaning of what we want to communicate. Therefore, figures 3 and 4 were revised so that there are separate panes for small/medium and medium/large comparisons and we hope that they are now more easily interpreted. The x-axis is now marked with weights in grams instead of ranks. The text was also changed to report weights instead of ranks.

5) Page 9 – I do not understand why the authors describe ANOVAs for the SWI-factor but then do not report them. This needs to be reported or at least explained more clearly.

R1.M5

We have clarified that p-values from the F-test should not be interpreted in an exploratory analysis and we only report CI as a measure of confidence/accuracy of our estimates, lines 410 – 414.

6) Page 10, line 395 – what test was done to determine statistical nonsignficance? It should be reported.

R1.M6

It was dependent t-tests that showed no significance between average accuracy for the different measures, and we have clarified this in the text (line 451).

7) Page 10, final paragraph – now I see that the log weights are analyses but again I do not know why. The authors need to explain this, as well as the maximum likelihood estimation analysis. Also, they refer to Figure 3 here – is this correct?

R1.M7

We have clarified that the logarithms give a linear response pattern, and that restricted maximum likelihood gives a less biased estimators of variance components according to Zuur et al. (2009). The figure reference has now been corrected to Fig 6 along with table reference also corrected to S8 Table, thank you for pointing this out.

8) Page 11, line 424-5 – hit rates were higher for smaller boxes. Is this because of density?

R1.M8

As illustrated by the supplementary figure on density, on average, participants assumed different densities for the different sizes of boxes (the point where the regression line fitted to density derived from judgments intersects the true density). For this reason, it is difficult to arrive at the conclusion that density in itself is a determinant of judgment. Furthermore, there are several other variables that may explain why the interval hit-rate was higher. For example, participants used a wider interval for the small box, which increases the chances of including the true weight. The reasons for using wide intervals may vary and we do not have sufficient data, and/or a clear hypothesis, to disentangle this question.

9) Page 13, line 530-1 – the fact that the illusion is “cognitively impenetrable” is not new, I think at least one citation here for this previous finding is necessary.

R1.M9

We have added a reference (27. Freeman CG, Saccone EJ, Chouinard PA. Low-level sensory processes play a more crucial role than high-level cognitive ones in the size-weight illusion.) indicating that this has previously been investigated and that our findings are in line with this previous finding. Lines 578-579.

10) Typos:

Page 2 line 51: should be “(WTP)”

Page 7 line 241: should be “experiment’s”

Page 7 line 242: should be “whole;” or “whole.” instead of “whole,”

Page 8 line 315: should be “participant’s”

Page 11 line 450: should be “overall”

Page 13 line 509: should be “were” instead of “was”

R1.M10

Thank you for noticing and informing us about these typos. The text has been corrected according to your suggestions.

Reviewer #2:

General comments:

This is a well-written paper interval-based judgements of weight with point judgements of weight. But, while there is much to like about the paper, I feel that is providing a solution to a problem that does not exist. No one in the literature uses fixed point judgements for weight in the way that the authors compare their new method to, so any comparison is simply a straw man. I’m sorry I cannot be more positive, but the study is simply too shallow to warrant publication in this outlet in my view.

R2.GC

Although uncommon in the literature, reviewer 1 pointed out to us that there actually exist at least one other study that has investigated SWI where they collected judgments in the form of kg or pounds. This reference has been added to the manuscript.

In everyday life when a person lifts an object and reflects over what it may weigh their thoughts will most probably be in the form of the weight metric they are familiar with. This applies when weight is communicated from one person to another as well. Hence, we used this measure to compare the accuracy in judging this metric for the point and interval judgment methods.

All participants thought of a reference weight, and this is now mentioned in the manuscript. Furthermore, many guess the weight competitions can easily be found with for example google indicating that this is something people do outside the lab.

1) The financial disclosures statement isn’t really a statement

R2.1

To the best of our knowledge, we have followed the instruction guidelines on what information to provide. In other words, unfortunately we do not know what specific information that should be added outside of what has already been stated to make it a real statement.

2) No ethical review and approval is rather surprising for this kind of study – presumably the journal has a policy on this?

R2.2

We have stated to the journal that this type of study does not need approval under current Swedish law. We have also provided with this revision a statement that this is true signed by the deputy head of the department.

According to Swedish laws, ethics approval is needed only if an intervention is made aimed at changing a person’s physical or mental state. Our method was to only measure their responses to stimuli under the physical and psychological conditions that they arrived to the experiment in, therefore, approval from the Swedish ethics approval authority should not be needed.

3) Line 127 - Order of tasks – where did this factor come from? Surely the task itself is the factor here (and within subject?)

R2.3

This paragraph was unclear. The factor was within subject. We have rewritten the text to clarify that we randomized, for each participant (lines 262 – 265), which of the two judgment methods (point or interval) that was performed first. This was used primarily to control for confounds from one task being performed before the other, and no such confounds where found throughout the process of analyzing the data.

4) Line 133 – how were the boxes weighted? Was the centre of mass in the physical centre?

R2.4

The weights where adhered alongside the inside of the box to achieve a feeling of a uniform weight. The weight was then adjusted by adding cotton inside the box so that precision at the level of single grams could be achieved. A photo illustrating the inside of the boxes has been added (Fig 2) to the manuscript along with text describing the construction.

5) Line 133- More information needed on the procedure of lifting each box – it’s quite vague

R2.5

We have clarified (lines 239 – 246) the procedure regarding the one hand grip and how the experiment leader illustrated the grips.

6) Point judgement task is quite an odd one – without anything to calibrate to, reporting weight in KG is not something people can do naturally because it’s not something they ever have to do. They can make relative judgements well, but this task sits awkwardly between a scaled judgement and an absolute magnitude estimation. The authors do acknowledge this in the results section, and discuss participant strategies for completing this task.

R2.6

We agree that it may be a bit odd in traditional experimental settings. However, it does reflect how people usually communicate weight to one another as well as being the way weight is usually reported when it is measured by a scale (or other weight measuring methods).

7) Line 193 – I think it was three lifts per box, but this is a bit confusingly presented

R2.7

We have clarified that they judged each of the boxes once per session, and that they judged the boxes in each sessions sequential order presented to the participants (lines 247 – 254).

8) Line 528 – the line beginning “Several participants showed signs…” isn’t quite worded correctly. But I’m not really sure that those sentences about the well-established cognitive impenetrability of the size-weight illusion are trying to accomplish

R2.8

We have improved the wording of the sentence. We have also rewritten parts of this paragraph (lines 620 – 625) to be clearer about the reasoning about how the stimuli and processes involved may affect whether or not the interval-method is an improvement over traditional point estimates.

9) This article seems of relevance: https://www.sciencedirect.com/science/article/abs/pii/S0001691814000663

R2.9

We agree, and it is now included as a reference in the introductory section describing SWI in relation to our experiment.

Reviewer #3:

General comments:

The authors measured and compared weight estimates of lifting and hefting objects using two different approaches: “point” and “self-selected interval” judgements – the former being the more prevalent approach. Task objects reflected the size-weight illusion, where objects of the same weight differing in size were presented (smaller ones typically feel much heavier). The authors considered whether the magnitude and variability of the illusion differed depending on which approach was used. I think the procedures were carried our rigorously and hence I won’t comment much about the details. I do have a few larger issues. The paper has merit for publication and it would be great if they can be addressed.

1) I am unable to tell you how much an object weighs in grams. We tried this a bit in my lab and it seems that the ‘psychology’ participants we recruit aren’t good either but, in all fairness, we never examined this systematically. Nonetheless, I am a little surprised that your participants can do this and I would like to know more why and how. What kind of students did you recruit? How did you train them to do this? There was some mention of having lifted a reference mug and how a participant said they were comparing the task objects to a litre of milk which weighs 1 kg (I did not ever think about this until now nor would most people know this). More information is needed. It would also be good to cite more work that examined how well people can judge weights in grams.

R3.1

In Sweden where the experiment took place, grams are the most common metric to report weight. Telling the true weight however is hard, and most participants where far from correct. We had no prior expectation that judging weights accurately would be improved by academic education as it is a sensi-motory task. For that reason, we did not keep record of the educational level of our participants and cannot from our data conclude any correspondence between education or ability to judge weights accurately. We did, however, ask participants after the lifting procedure was done if they thought about other weights they knew the weights for, and a liter of milk was a common response. However, these questions where open ended, and used to gather information for development of further studies where these questions can be standardized so that they measure reference weights or other judgment tactics accurately so that it can be compared to judgment accuracy. However, this is outside the scope of the present paper.

In our open-ended questions at the end of the experiment all participants stated that they used some kind of references weight with 1kg being the most common and 1 liter of milk the most common object representing that 1kg reference weight. However, these responses where too unstructured to be used for an analysis of their effects on the judgments. We have added, to the discussion, information about this and that we encourage future studies to create structured responses to such questions to be able and determine if they are important for whether or not the interval method will be useful for the person.

2) The two judgement procedures differ in more than one way. Hence, it is difficult to know exactly how the two can lead to differential effects. One concern I have is that they did not match in the number of iterations provided. Namely, for a particular ‘trial’, participants provided one judgment for the “point” approach and multiple ones for the “self-selected interval”. Can resulting differences not because of the strategic process entailed but rather the number of times people could provide an estimate? Would it have been better to allow participants to provide the same number of estimates in the “point” approach?

R3.2

We agree that number of judgments per item can be a confounding variable. We have added this to the discussion about the time a box is held, and suggested how the two factors can be investigated in future studies to control for this problem.

3) I felt the paper ended anti-climatically. The aim was to compare two methodological approaches and it would be a fair expectation for the reader to receive recommendations as to which of the two approaches should be used and under what contexts.

R3.3 (same as E.6)

Unfortunately, we interpret the results as being not clear and/or strong enough to base recommendations or guidelines on. We have provided suggestions for future research to achieve this, and we would like to argue that more research on, for example, what individuals that may or may not benefit from the interval approach, is needed before any concrete advice should be formulated. Such studies can be focused either on judgment strategies, or individual differences in terms of background variables such experiences with lifting weights where the weight in grams are known (e.g. in the gym or in the post-office).

4) The discussion could also expand on what the results could tell us about the size-weight illusion.

This question is very interesting and we have expanded further on what the method may reveal about the SWI, at the end of the second paragraph of the discussion section.

5) I did not understand what you were trying to say in lines 34 to 40. Please unpack.

R3.5

We have rewritten this paragraph (starting at line 52) in an attempt to be more precise and clear.

6) Fix “size-size weight illusion” in line 502.

R3.6

Thank you for noticing, this has been corrected.

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf

and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please ensure that you include a title page within your main document. We do appreciate that you have a title page document uploaded as a separate file, however, as per our author guidelines (http://journals.plos.org/plosone/s/submission-guidelines#loc-title-page) we do require this to be part of the manuscript file itself and not uploaded separately.

Could you therefore please include the title page into the beginning of your manuscript file itself, listing all authors and affiliations.

JR.2

We have included the title page in the main manuscript file

3. You indicated that ethical approval was not necessary for your study. Could you please provide further details on why your study is exempt from the need for approval and confirmation from your institutional review board or research ethics committee (e.g., in the form of a letter or email correspondence) that ethics review was not necessary for this study? Please include a copy of the correspondence as an ""Other"" file.

JR.3

As we have stated for reviewer 2, comment 2, Approval is needed only if an intervention is made aimed at changing a person’s physical or mental state. Our method was to only measure their responses to stimuli under the physical and psychological conditions that they arrived to the experiment in; therefore, approval from the Swedish ethics approval authority should not be needed. We have also provided with this revision a statement that this is true signed by the deputy head of the department.

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

JR.4

No changes are needed. DOIs will be provided if the manuscript is accepted.

5. Please ensure that you refer to Figure 6 in your text as, if accepted, production will need this reference to link the reader to the figure.

We have corrected the figure numbers throughout the text.

Attachment

Submitted filename: Response to Reviwers.doc

Decision Letter 1

Piers D L Howe

15 Dec 2021

PONE-D-21-18240R1Self-selected interval judgments compared to point judgments: A weight judgment experiment in the presence of the size-weight illusion.PLOS ONE

Dear Dr. Gonzalez,

Thank you for submitting your revised manuscript to PLOS ONE. After careful consideration, we feel that although you addressed most of the concerns raised by the reviewers, there are still some outstanding concerns that you need to address. In particular, before we can accept your manuscript, you need to address the concerns raised by the first reviewer. The other two reviewers are satisfied with the manuscript as it currently stands

Please submit your revised manuscript by Jan 29 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by Reviewer 1. You should upload this letter as a separate file labeled 'Response to Reviewer 1'. Based on what you write here, I will decided whether your manuscripts need to be reviewed again by Reviewer 1. I do not propose sending it back to Reviewers 2 and 3.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Piers D. L. Howe

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I think this is an improved, more substantive version of this paper and I appreciate the authors’ work on the manuscript. However, the authors make some claims based on results/data that they have not tested statistically. This is problematic. There are also some other more minor issues that I believe still need to be addressed before this work is publishable.

Major issues:

Line 405 states that there was an ANOVA (actually 2) but then the results are not reported. This is confusing and I still don’t understand why, even they talk about a priori hypotheses etc. In their study aims paragraph in the introduction they state “We also wanted to compare the strength of the illusion across the two different judgment methods”. Then, on line 421 they say “We interpret this as the SWI-reducing effect of the interval method…” as well as similar claims/interpretations in the Discussion. The authors should not claim an effect of a manipulation without having inferential statistical proof of the effect. This needs to be addressed in some way. Similarly, the figure 4 and 5 captions also refer to “interaction plots” which I think is misleading given that they haven’t tested an interaction.

The same issue remains regarding variability (ie SD) in judgements on line 537 onwards. They claim differences that have not been tested statistically.

Lines 620 onwards – this paragraph doesn’t make sense. Not “thinking the illusion away” is what many previous authors have meant by stating it is “cognitively impenetrable”. This is also not what is meant in the literature base by the bottom up vs top down issue. I think the authors needs to research these issues more carefully if they want to include discussion about them.

Also, one of the primary “bottom-up” explanations is related to density perception, which the authors have now referred to in a supplementary figure following my request, but this is still not mentioned in the main manuscript. I still think this is problematic for the reasons I stated in my original review, including that it is reported as an (important) stimulus characteristic. I believe at least a brief discussion of the potential contribution of density to their results and a statement pointing the reader to the supplementary figure is warranted.

Minor issues:

Line 33 “the smaller of two identical (w.r.t shape and size) objects” does not make sense ie there can’t be a smaller of two objects with identical size. Maybe the authors mean weight instead of size?

Line 131 onwards: “Furthermore, focusing on weight, rather than size, has been found to increase the illusion (19). This suggests that asking for the true weight should increase the illusion, because it leads the participants to focus on weight. Hence, differences between judgment methods should be easier to find with a strong illusion when the true weight is judged, instead of subjective heaviness being judged.”

I do not agree with the statement starting with “this suggests…”. I think focusing on true weight instead of heaviness is very different from focusing on weight instead of size. My intuition is that the former would decrease the illusion if anything. I’m not suggesting the authors change their prediction on this issue to accord with mine necessarily but I think whatever they predict on the issue needs to be better substantiated than it is currently. Otherwise I’d say it’s best to remain agnostic on the issue if they cannot.

Typo line 168 – “where” instead of “were”. Same on line 197 “boxes where filled with cotton”.

I think Fig1 was missing.

Fig5 – not sure if it is a problem with the figure or just how it uploaded on the system but the line for mid4 is not visible.

Reviewer #2: The authors have made extensive revisions, for which they should be commended. I still am not overly convinced by the rebuttal to my main concern (the straw man) - the article which is suggested by R1 seems to be done in a very specific context (weight lifters judging the weight of dumbbells, of which presumably they have some expertise already) and the argument that 'guess the weight contests exist' is precisely the argument against why anyone would ever use a point estimate in a weight-judgement task such as this one - because people are pretty hopeless at the task. However I note that the other reviewers didn't share my concerns and, as the manuscript is an otherwise fine piece of work, I do not feel it is my role to block its publication.

Reviewer #3: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Mar 16;17(3):e0264830. doi: 10.1371/journal.pone.0264830.r004

Author response to Decision Letter 1


10 Feb 2022

Dear Dr. Howe

We have revised our manuscript in response to the issues raised by reviwer 1. Please, find below our responses. We have numbered the questions 1 – 9 and marked our responses wit R and the corresponding number. Reviewer issues are in italics and responses in normal text.

Reviewer #1: I think this is an improved, more substantive version of this paper and I appreciate the authors’ work on the manuscript. However, the authors make some claims based on results/data that they have not tested statistically. This is problematic. There are also some other more minor issues that I believe still need to be addressed before this work is publishable.

Major issues:

1. Line 405 states that there was an ANOVA (actually 2) but then the results are not reported. This is confusing and I still don’t understand why, even they talk about a priori hypotheses etc. In their study aims paragraph in the introduction they state “We also wanted to compare the strength of the illusion across the two different judgment methods”. Then, on line 421 they say “We interpret this as the SWI-reducing effect of the interval method…” as well as similar claims/interpretations in the Discussion. The authors should not claim an effect of a manipulation without having inferential statistical proof of the effect. This needs to be addressed in some way. Similarly, the figure 4 and 5 captions also refer to “interaction plots” which I think is misleading given that they haven’t tested an interaction.

R1. We agree that the section was confusing, and we have rewritten it so that it is clearer that it was specifically the peak that we had no prior hypothesis. We have also added ANOVA-analyses of the main effect of measure and the interaction effect with weight causing the peak for point measures. [lines 406 – 437]

2. The same issue remains regarding variability (ie SD) in judgements on line 537 onwards. They claim differences that have not been tested statistically.

R2. We have added t-tests for these differences. [lines 541 – 542]

3. Lines 620 onwards – this paragraph doesn’t make sense. Not “thinking the illusion away” is what many previous authors have meant by stating it is “cognitively impenetrable”. This is also not what is meant in the literature base by the bottom up vs top down issue. I think the authors needs to research these issues more carefully if they want to include discussion about them.

R3. The intended meaning of this paragraph does not add anything of importance and we have therefore removed it.

4. Also, one of the primary “bottom-up” explanations is related to density perception, which the authors have now referred to in a supplementary figure following my request, but this is still not mentioned in the main manuscript. I still think this is problematic for the reasons I stated in my original review, including that it is reported as an (important) stimulus characteristic. I believe at least a brief discussion of the potential contribution of density to their results and a statement pointing the reader to the supplementary figure is warranted.

R4. We have clarified in the stimuli description that density was used to center the entire stimulus set around a parameter and that it was not intended as a main research topic for this study [lines 175 – 178]. We acknowledge that density matters for SWI studies and it is now discussed briefly along with directions to the figure [lines 612 – 619].

Minor issues:

5. Line 33 “the smaller of two identical (w.r.t shape and size) objects” does not make sense ie there can’t be a smaller of two objects with identical size. Maybe the authors mean weight instead of size?

R5. Thank you for pointing out this clearly confusing sentence, it has been rewritten.

6. Line 131 onwards: “Furthermore, focusing on weight, rather than size, has been found to increase the illusion (19). This suggests that asking for the true weight should increase the illusion, because it leads the participants to focus on weight. Hence, differences between judgment methods should be easier to find with a strong illusion when the true weight is judged, instead of subjective heaviness being judged.”

I do not agree with the statement starting with “this suggests…”. I think focusing on true weight instead of heaviness is very different from focusing on weight instead of size. My intuition is that the former would decrease the illusion if anything. I’m not suggesting the authors change their prediction on this issue to accord with mine necessarily but I think whatever they predict on the issue needs to be better substantiated than it is currently. Otherwise I’d say it’s best to remain agnostic on the issue if they cannot.

R6. We agree that the reasoning does not hold when assuming that judgments of true weight should increase the illusion more than subjective judgments of heaviness do. The text has been revised accordingly [lines 132 - 134].

7. Typo line 168 – “where” instead of “were”. Same on line 197 “boxes where filled with cotton”.

R7. We have corrected this, thank you for noticing.

8. I think Fig1 was missing.

R8. We have checked that the figure should now be included.

9. Fig5 – not sure if it is a problem with the figure or just how it uploaded on the system but the line for mid4 is not visible.

R9. We have now checked the figures with the Preflight Analysis and Conversion Engine (PACE) as requested by PLOS ONE where no problems are reported and the lines are visible.

Attachment

Submitted filename: Response to Reviwer 1.docx

Decision Letter 2

Piers D L Howe

18 Feb 2022

Self-selected interval judgments compared to point judgments: A weight judgment experiment in the presence of the size-weight illusion.

PONE-D-21-18240R2

Dear Dr. Gonzalez,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Piers D. L. Howe

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Piers D L Howe

8 Mar 2022

PONE-D-21-18240R2

Self-selected interval judgments compared to point judgments: A weight judgment experiment in the presence of the size-weight illusion.

Dear Dr. Gonzalez:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Piers D. L. Howe

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Illustration (1 of 3) of each participant’s average judgments for each of the boxes and judgment methods.

    The figure is conceptually the same as Fig 2, which introduces the result section. Points indicate point judgments, intervals indicate the range of the upper and lower interval limits and the color darkens for each consecutive split (i.e. the darkest middle field indicates the upper and lower limits after the final interval split). True weights are described on the x-axis and judgments on the y-axis, thus, the diagonal line indicates true judgments. To these individual plots we have, in the title, added the number of size-weight illusion cases for Point (P), Mid1 (M1), Mid4 (M4) and no interval overlap (IO).

    (TIFF)

    S2 Fig. Illustration (2 of 3) of each participant’s average judgments for each of the boxes and judgment methods.

    (TIFF)

    S3 Fig. Illustration (3 of 3) of each participant’s average judgments for each of the boxes and judgment methods.

    (TIFF)

    S4 Fig. Average judgments for each of the trial sessions.

    (TIFF)

    S5 Fig. Average judgments relative the true weight of the box for each of the sessions.

    (TIFF)

    S6 Fig. Average log judgments and log true weights.

    (TIFF)

    S7 Fig. Density derived from judgments relative to the true density of the boxes.

    To illustrate how the judgments relate to density depending on box size we first computed the density derived from judgments by dividing weight judgments by the true volume of the judged box. We then calculated the derived judged density relative to the true density. This was done to be able to visualize the full range of judgments in a single plot. Furthermore, with this calculation, a value of 1.0 means that the derived judged density is the same as the true density. The left panel illustrates average judgments for point judgments (solid dots) and mid4 (empty square), along with the average judged upper and lower interval limits (vertical lines showing the range of the interval). The right panel illustrates linear regression lines fitted to the same data used for the left pane. The solid lines are fitted to point judgments and the dotted lines are fitted to the mid4 judgments. The figure shows that the expected density for the different boxes are not the same because the regression lines intersect 1.0 on the y-axis at different points for each of the box sizes, large (blue), medium (red) and small (green). The average expected density for each box size is indicated by the value on the x-axis where the regression line goes through the horizontal line at 1.0 on the y-axis.

    (TIFF)

    S1 Table. Model estimates: Mixed effect model illustrated in Fig 3.

    True weight of boxes and judgment estimates were log transformed for this analysis. Main effects of true box weight, the type of judgment estimate (point vs mid4), and size (small, medium, large). First-order interaction effects between true weights, type of judgment estimate, and size. Random effects for true weight, type of judgement estimate, and size.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviwers.doc

    Attachment

    Submitted filename: Response to Reviwer 1.docx

    Data Availability Statement

    All relevant data are available on Figshare (DOI: 10.17045/sthlmuni.19213245).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES