Skip to main content
Journal of Food Science and Technology logoLink to Journal of Food Science and Technology
. 2021 Jan 3;58(7):2815–2824. doi: 10.1007/s13197-020-04890-9

Study of the influence of line scale length (9 and 15 cm) on the sensory evaluations of two descriptive methods

Aline Iamin Gomide 1,, Rita de Cássia dos Santos Navarro Silva 1, Moysés Nascimento 2, Luis Antônio Minim 1, Valéria Paula Rodrigues Minim 1
PMCID: PMC8196144  PMID: 34194115

Abstract

The line scale is widely used in different lengths to quantify the intensity of descriptors in sensory evaluation. Since studies related to its size are still limited the objective was to determine what variables of descriptive sensory evaluation can be influenced when different scale length is considered in two different methods: Optimized Descriptive Profile (ODP) (low degree of training) and Conventional Profile (CP) (high degree of training). Five chocolate samples were evaluated by two panels, one using the 9 cm and the other using the 15 cm line scale. The panels performed the sensory analysis using the ODP and after the CP method. The following criteria were investigated: interaction between sample and evaluator, discriminative capacity, repeatability of results, and frequency of score use on the unstructured scale. The influence of scale length on sensory responses was similar in the two methods (ODP and CP). When comparing the two scales in both methods, it was observed that the 15 cm scale resulted in an improvement in discriminative capacity, reduction of interaction and the evaluators tended to distribute their ratings more evenly across this scale length. The repeatability of results showed a slight tendency to be better on the 9 cm scale.

Keywords: Line scale, Scale length, Conventional profile, Optimized descriptive profile

Introduction

Classical descriptive analysis consists of a complete qualitative and quantitative description of the sensory characteristics of food products by a trained panel (Varela and Ares 2012). The Quantitative Descriptive Analysis (QDA) is one of the most well-known classical methods. In recent years, generic methodologies including the Conventional Profile (CP) have been extensively used due to its increased freedom of application. (Murray et al. 2001).

Due to the time consumption to perform these analyses, many alternative methodologies have been developed to eliminate the long training stage. However, these faster methods provide only qualitative data. The Optimized Descriptive Profile (ODP), proposed by Silva et al. (2012), stands out among alternative methods due to its ability to also provide quantitative data.

In the quantitative description, the evaluator expresses the intensity of each qualitative term for a specific food, allowing applications on quality control, formulation optimization and also to correlate sensory and instrumental measurements (Meilgaard et al. 2006).

Descriptive analyses commonly use three types of intensity scales: magnitude estimation, category and line. The selection of the scale depends on the method to be used. The advantage of the line scale is the absence of any numerical values associated with the response and the limited use of words, minimizing potential trends among evaluators to avoid or to prefer specifics numbers or expressions (Minim and Silva 2016). Furthermore, it provides several numbers of places (within the constraints of the actual length of the line) to indicate the intensity of the sensory attribute. Because the line scale is a type of interval scale, most statistical procedures can be used for their analysis, including means, standard deviation, t-tests, analysis of variance and others (Stone and Sidel 2004).

Due to its advantages, the line scale was recommended for QDA and posteriorly in other classical methods such as the Free-Choice Profile (1984) and Spectrum (1991). Since the advent of the QDA, the line scale has been used in many generic methods (Dairou and Sieffermann 2002; Ginés et al. 2004; Blancher et al. 2007; Brannan 2009; Silva et al. 2012) and was selected for the ODP method.

On the line scale the intensity of an attribute generally increases from left to right, with extreme values anchored by terms that represent “weak” and “strong” intensity of the stimulus. The evaluator’s task is to make a mark on the scale that reflects the intensity of the attribute evaluated (Stone and Sidel 2004).

The line scale has been used in different lengths. The ODP and QDA recommended the use of a 9 and a 15 cm scale, respectively. Generic methods are more flexible, including studies that used 9 cm (Wszelaki et al. 2005; Silva et al. 2012; Castilhos et al. 2020), 10 cm (Dairou and Sieffermann 2002; Blancher et al. 2007; Picouet et al. 2019; Jeyaprakash et al. 2020), 12 cm (Ginés et al. 2004) and 15 cm line scales (Brannan 2009; Mielby et al. 2014; Sharma et al. 2017).

The scale length and number of scale categories are major variables that affect scale sensitivity (Stone and Sidel 2004). According to Stone and Sidel (2004), a three-point scale is less sensitive than a five-point scale (about 30%) and both are less sensitive than a seven- or nine-point scale. Regarding the line scale, Stone and Sidel (2004) mentioned that in a limited study it was observed that extending the scale from 15 to 20 cm did not increase the sensitivity. Shortening to less than 15 cm reduced the sensitivity.

According to Park et al. (2007), the number of categories to be chosen for the category scale would depend on the number of different stimuli to be assessed. Enough categories should be available to represent accurately the perceived spacing between the ranks. For a 9-point scale, a given judge would only be able to represent the spacing between perhaps four or five products for a single experimental session, while data for the other products would not be useful because they would be ‘bunched’ together (Park et al. 2007).

Regarding the line scale, it would be expected that the scale length would depend on the number of samples. A large number of samples would imply on a larger scale, and how about a small number of samples? Is there any difference when different lengths are used? Even when a small number of samples is considered, different studies adopted different scale lengths, showing that in the literature there is no consensus on that selection. For example, a 9 cm scale was used for evaluation of five (Silva et al. 2012) and six samples (Wszelaki et al. 2005), while Hong et al. (2010) and Lee and Vickers (2010) used a greater scale (15 cm) to evaluate almost the same numbers of samples, four and six samples, respectively, showing the lack of consensus.

Several studies have compared different types of scaling such as category scales, line scales and magnitude estimation (Shand et al. 1985; Lawless and Malone 1986; Purdy et al. 2002; Jeon et al. 2004; Silva et al. 2013a; Gamba et al. 2020), but few studies have focused on comparing different lengths of line scale (Carlin et al. 1956; Jeon et al. 2004; Park et al. 2007). Additionally, these few studies that compared different scale length have focused on counting scaling errors based on the inversion of stimulus, considering how a judge generates numbers in response to a set of intensities and on some others aspects (Carlin et al. 1956; Jeon et al. 2004; Park et al. 2007), without measuring the effect of scale length on sensory responses obtained by statistical tests.

Due to the scarcity of studies, it is necessary to develop more works on this subject. Therefore, this study aims to compare the influence of two different scale length on some variables of descriptive sensory evaluation, such as sample × evaluator interaction, discrimination of samples and repeatability of results, by means of statistical tests, and also to compare the frequency of score use on that unstructured scales.

According to Meilgaard et al. (2006), besides the adequate scale selection, the validity and reliability of intensity measurements depend on the training of the evaluators. Therefore, the training is an important aspect that should be considered when evaluating the influence of line scale length. Thus, this work aims to study the influence of line scale length on sensory responses in one descriptive method with low degree of training (ODP—provides quantitative data) and in other with high degree of training (CP—widely used with freedom of application), considering five samples on evaluations.

Material and methods

The two descriptive methods (ODP and CP) were performed using two scales: 9 cm line scale (because it is already recommended by ODP) and 15 cm line scale (commonly used in descriptive methods and recommended by the QDA). The influence of scale length was independently evaluated on each sensory method, separately. Subsequently, it was verified which of the methods is more sensitive by the variation of scale length.

Samples

Five chocolate samples were utilized as food matrices and were defined by preliminary tests. The five samples were composed by different proportions of milk and bittersweet chocolate, respectively, 90:10; 70:30; 50:50; 30:70; 10:90. The milk and bittersweet chocolate contained 35% and 70% of cocoa, respectively. Each chocolate unit measured was approximately 30 mm in diameter and 20 mm in height and was prepared by a local company (Viçosa, MG, Brazil).

Sensory evaluation

The sensory characterization of chocolates was conducted in a laboratory of the Federal University of Viçosa and it was approved by the ethics committees of the Institution (17104913.4.00005153). It was performed by two sensory descriptive methods: CP and ODP. The common stages of both methods were: recruitment, pre-selection, determination of descriptive terminology and familiarization of evaluators with the reference material. The experiment design overview is illustrated in Fig. 1.

Fig. 1.

Fig. 1

Flow diagram of the experiment

A total of 64 candidates were recruited using questionnaires, as proposed by Meilgaard et al. (2006). The pre-selection of recruited candidates consisted of a sequence of four triangular tests. The criterion for selection was an assertion of 75% of the tests, as recommended by Meilgaard et al. (2006). The sensory attributes were defined by the previous list technique, obtained from Minim et al. (2000) and Silva et al. (2013b), as proposed by Damasio and Costell (1991). Eight attributes were defined by consensus: brown color, cocoa mass aroma, cocoa mass flavor, sweetness, residual bitterness, hardness, spreadability and adhesivity. Next, with the assistance of the evaluators, the reference materials (“weak” and “strong”) of each attribute were defined.

For the familiarization step the descriptive terms and their reference materials were presented to the evaluators in individual booths during one session. The evaluators were instructed to read the attribute definitions and to taste the references.

Posteriorly, the evaluators were randomly distributed into two teams, each with 20 evaluators, satisfying the minimum of 16 as proposed by Silva et al. (2014b) for the ODP method. One panel evaluated the samples according to the ODP method followed by the CP, using the 9 cm line scale (panel 1). The other performed the same procedure using the 15 cm line scale (panel 2). The scales were presented on printed ballots.

The initial and commons steps of both methods (ODP and CP) were executed by the two panels together in order to obtain consensus in the evaluation of the sensory attributes, allowing for subsequent comparison between the techniques.

Optimized descriptive profile (ODP)

The evaluators of panels 1 and 2 began assessing the chocolate samples according to the attribute-by-attribute protocol, as recommended by Silva et al. (2012). Therefore, only one attribute was evaluated per session and all the samples were presented together with the reference materials of the attribute to be evaluated.

Panel 1 evaluated the samples using the 9 cm line scale and panel 2 using the 15 cm scale, thus generating data for the two techniques (ODP-9 and ODP-15). Evaluations were performed using the Balanced Block Design (BBD). Thus, each evaluator represented a block and assessed samples with three repetitions for each attribute. The number of sessions required corresponded to the number of attributes (eight) multiplied by the number of repetitions, totalizing 24 sessions.

Conventional profile (CP)

After performing all ODP steps, the evaluators of both panels were properly trained for subsequent evaluation of products by the CP. Thus, the evaluation step of the ODP served as a pre-training of the panels, as performed by Silva et al. (2012).

Training consisted of several exercises, including ordering tests, recognition of reference materials and allocation of sensory attribute intensity on the line scale, as performed by Simiqueli et al. (2015). The assessors underwent training exercise for about two months. After that, the evaluators performed a preliminary test to verify if they were adequately trained. Thus, the final evaluation step of CP was simulated with four repetitions, according to BBD, where two samples of chocolate were evaluated (one composed by 70% of milk chocolate + 30% of bittersweet chocolate and the other by 30% of milk chocolate + 70% of bittersweet chocolate). Panel 1 performed the preliminary tests using 9 cm line scale and panel 2 used the 15 cm line scale. Analyses of variance (ANOVA) were performed per attribute for each evaluator. It was selected the evaluators that presented discriminatory capacity and reproducibility for all attributes (p.Fsample < 0.3 and p.Frepetition > 0.05), considering the same selection parameter performed by (Silva et al. 2012). Thus, of the 20 evaluators of each panel, 16 presented satisfactory selection parameter and were selected to make up the sensorial team. To permit comparison between the ODP and CP methods, the evaluators of panel 1 and panel 2 considered for analysis of results in the ODP were the same 16 used in the CP.

Lastly, the trained and selected evaluators analyzed the test-chocolates according to the BBD. Each evaluator randomly analyzed the five samples in one session in relation to all attributes, without the presence of reference materials and in a monadic way. Three repetitions were conducted, resulting in 3 evaluation sessions. Panel 1 evaluated the samples using the 9 cm line scale (CP-9) and panel 2 using the 15 cm line scale (CP-15).

Statistical analysis

The data obtained by the four techniques (ODP-9, ODP-15, CP-9 and CP-15) were individually analyzed in relation to the effect of samples × evaluator interaction, discrimination of samples, repeatability of results and frequency of score use on the line scale. The specific techniques compared were: (i) ODP-9 and ODP-15, and (ii) CP-9 and CP-15 in order to evaluate the effect of scale length in both methods and to determine which method was more influenced by the scale length with regards to each criterion studied.

Effect of sample × evaluator interaction

The effect of interaction was determined by ANOVA with two sources of variation (sample and evaluator) and sample × evaluator interaction. Eq. (1) shows the mathematical model.

Yijk=m+Ti+Bj+(TB)ij+eijk 1

where:

  • Yijk = score of sample i attributed by evaluator j in repetition k;

  • m = constant inherent to the model or general average;

  • Ti = fixed effect of sample i;

  • Bj = random effect of evaluator j;

  • (TB)ij = effect of sample × evaluator interaction;

  • eijk = normal random error, independent and equally distributed (0, σ2).

Significance (p < 0.05) of the interaction effect was determined for each sensory attribute by the F-test. The technique which presented the most attributes with significant effect was considered the technique with the greatest interaction.

Discriminative capacity of the evaluators

The effect of the samples (F-test) was determined for each attribute by ANOVA (Eq. 1). In the case of a significant interaction effect, the Fsample was calculated by using the Mean Square of interaction as the denominator, as recommended by Stone and Sidel (2004).

In the case of a significant difference between samples, the ANOVA was followed by the Tukey test. The statistical procedures (F-test and Tukey test) were performed considering a level of 5% of significance. The technique which formed the most groups for one specific attribute was considered the technique with the greatest discriminative capacity.

Repeatability of the results

To assess the repeatability of the panels, the ANOVA (level of 10% of significance) was conducted considering an error among the evaluations (eij) and within the evaluations (εijk). This analysis was performed for each attribute and technique separately, considering the three repetitions of the same sample. It was determined if there was a significant effect between successive evaluations by the same panel (effect of repetition − eij). According to Barbin (1993), the mathematical model that represents the analysis is shown by Eq. (2). The same mathematical model was used by Silva et al. (2014a) to assess the capacity of the panel to repeat the results in the ODP.

The null hypothesis of zero variability was tested among evaluation repetitions (σe2 = 0). Because it is desirable to accept the null hypothesis for this criterion, the level of significance considered was greater than the others (10%). Thus, the probability of type II error (probability of to accept the null hypothesis when it is false) is diminished, being more rigorous. The technique that presented the most attributes with no significant effect was considered the technique with the greatest repeatability.

Yijk=m+Ti+Bj+eij+εijk 2

where:

  • Yijk = score of sample i attributed by evaluator j in repetition k;

  • m = constant inherent to the model or general average;

  • Ti = fixed effect of sample i;

  • Bj = random effect of evaluator j;

  • eij = random effect of repetitions for evaluation of the same sample;

  • εijk = normal random error, independent and equally distributed (0, σ2).

Frequency of score use on the line scale

This analysis sought to verify which range of the scale each panel used with the greatest frequency to indicate the intensity of each attribute in each technique. To enable comparison the data was previously corrected by dividing the individual scores by the scale length used in the evaluation. Thus, the individual scores of the evaluators obtained from the ODP-9 and CP-9 techniques were divided by 9 and in the other techniques by 15. This correction was necessary since 1 cm represents a higher proportion in the 9 cm scale (0.11%) than in the 15 cm scale (0.07%). Thus, all scales were standardized to the range of 0–1 and the frequency of use of each 0.1 cm of the standardized scale was determined.

Software

The analysis of variance and the means tests were performed using the SAS (Statistical Analysis System), version 9.1, licensed to the Universidade Federal de Viçosa.

Results

Study of sample × evaluator interaction

In all techniques a significant effect of interaction (p < 0.05) was observed in the F-test (Table 1). The existence of interaction indicates that at least one evaluator is assessing the samples differently from the panel. This is a common occurrence in sensory analysis and is difficult to control (Silva and Damásio 1994).

Table 1.

p-value of analysis of variance for sample × evaluator interaction

ODP-9 ODP-15 CP-9 CP-15
Attributes p-value p-value p-value p-value
Brown color 0.077ns 0.701ns 0.008* 0.859ns
Cocoa aroma 0.078ns 0.051ns 0.001* 0.501ns
Cocoa flavor 0.001* 0.074ns 0.003* 0.058ns
Sweetness 0.112ns 0.100ns 0.028* 0.496ns
Residual bitterness 0.002* <0.001* 0.003* 0.136ns
Hardness 0.040* 0.509ns <0.001* 0.306ns
Spreadability 0.400ns 0.518ns 0.021* 0.045*
Adhesivity 0.001* 0.057ns 0.012* 0.075ns

*p-value significant at 5% probability; ns: not significant at 5% probability

A reduction in interaction was noted when the larger scale was used. In the ODP-9 the interaction was significant for four attributes (cocoa flavor, residual bitterness, hardness and adhesivity) and in the ODP-15 for only one attribute (residual bitterness).

A similar result was observed in the CP method, however in a more pronounced way. In the CP-9 the interaction was significant for all attributes, while in CP-15 a drastic reduction was observed, where interaction was significant for only one descriptive term (spreadability).

Discriminative capacity of evaluators

For all techniques the Fsample test was significant (p.Fsample < 0.01) for all attributes. Thus, the Tukey test was performed and the results are listed in Table 2.

Table 2.

Means scores (±standard deviation) of the sensory attributes of chocolate in the four techniques, tested by means of the Tukey test (α = 0.05)

Sensory attributes
Brown color Cocoa aroma Cocoa flavor Sweetness Residual bitterness Hardness Spreadability Adhesivity

ODP

9 cm

F1 0.7 ± 0.7e 0.9 ± 0.9d 0.7 ± 0.7d 7.9 ± 1.5a 0.6 ± 0.6e 1.4 ± 1.4d 7.7 ± 1.6a 7.5 ± 2.0a
F2 2.7 ± 1.2d 2.7 ± 1.7c 2.3 ± 1.4c 6.6 ± 1.8b 2.2 ± 1.2d 3.4 ± 2.1c 6.8 ± 1.5a 6.1 ± 2.6a
F3 5.5c ± 1.4c 5.1 ± 2.4b 4.4 ± 2.0b 3.6 ± 1.8c 4.5 ± 1.9c 6.1 ± 2.1b 3.6 ± 1.8b 3.6 ± 2.4b
F4 7.1 ± 1.4b 7.0 ± 1.6a 6.3 ± 2.0a 1.9 ± 1.5d 6.4 ± 1.8b 7.3 ± 1.6ab 2.6 ± 1.9bc 1.7 ± 1.2b
F5 8.3 ± 0.8a 8.1 ± 1.6a 7.5 ± 1.7a 0.9 ± 0.6d 8.0 ± 1.0a 7.8 ± 1.3a 1.5 ± 0.9c 1.8 ± 1.5b

ODP

15 cm

F1 1.3 ± 1.1e 1.9 ± 1.5d 1.5 ± 1.4e 13.1 ± 1.8a 1.2 ± 1.0e 3.3 ± 2.8d 12.8 ± 2.0a 12.6 ± 2.3a
F2 4.4 ± 1.8d 4.2 ± 2.7c 4.0 ± 2.7d 10.3 ± 3.1b 3.2 ± 2.3d 5.7 ± 3.5c 11.3 ± 2.1a 10.5 ± 3.1b
F3 7.7 ± 2.1c 8.1 ± 3.1b 7.5 ± 3.1c 7.0 ± 3.1c 6.8 ± 3.2c 9.0 ± 3.4b 7.2 ± 3.2b 7.4 ± 3.4c
F4 11.8 ± 1.7b 10.9 ± 2.5a 10.4 ± 2.5b 4.4 ± 2.8d 10.2 ± 3.5b 10.7 ± 3.1ab 4.6 ± 3.2c 4.5 ± 3.1d
F5 13.6 ± 1.1a 12.7 ± 2.4a 12.6 ± 2.5a 2.3 ± 2.3e 12.6 ± 2.0a 12.2 ± 2.8a 2.8 ± 2.3d 2.7 ± 2.5d

CP

9 cm

F1 0.8 ± 0.9d 0.6 ± 0.6d 0.5 ± 0.5c 8.2 ± 1.3a 0.4 ± 0.5d 1.3 ± 1.4c 8.1 ± 1.0a 8.0 ± 1.2a
F2 2.3 ± 1.9c 2.1 ± 1.9c 1.7 ± 1.3c 7.2 ± 1.7a 1.4 ± 1.3d 2.2 ± 1.6c 7.0 ± 1.9a 7.0 ± 1.9a
F3 4.5 ± 2.2b 4.3 ± 2.7b 4.1 ± 2.5b 4.7 ± 2.5b 4.0 ± 2.6c 4.5 ± 2.7b 4.9 ± 2.5b 4.8 ± 2.5b
F4 6.5 ± 1.8a 6.5 ± 2.0a 6.4 ± 2.4a 2.2 ± 1.6c 6.2 ± 2.5b 6.9 ± 1.9a 2.9 ± 2.2c 2.2 ± 1.9c
F5 7.7 ± 1.1a 7.5 ± 1.5a 7.7 ± 1.6a 1.2 ± 1.2c 7.6 ± 1.7a 6.9 ± 2.3a 1.8 ± 2.0c 1.6 ± 1.6c

CP

15 cm

F1 3.0 ± 2.2e 2.4 ± 2.2e 1.9 ± 1.7e 12.6 ± 2.7a 1.4 ± 1.3c 3.6 ± 3.2c 12.4 ± 2.3a 12.2 ± 2.5a
F2 5.3 ± 3.2d 4.9 ± 3.6d 3.9 ± 3.4d 11.0 ± 3.4a 3.1 ± 3.5c 5.2 ± 3.3c 10.7 ± 3.6a 10.2 ± 3.9b
F3 8.8 ± 3.0c 8.7 ± 3.5c 8.3 ± 3.7c 6.3 ± 3.5b 7.5 ± 4.2b 8.5 ± 3.3b 6.9 ± 3.7b 5.5 ± 3.6c
F4 11.3 ± 2.9b 11.0 ± 3.1b 11.0 ± 3.5b 2.9 ± 2.7c 11.2 ± 3.6a 10.9 ± 3.0a 4.8 ± 3.3c 3.9 ± 3.4c
F5 13.1 ± 1.5a 12.9 ± 1.4a 13.1 ± 1.7a 1.4 ± 1.1c 12.5 ± 2.8a 12.1 ± 3.0a 3.1 ± 3.1c 1.9 ± 2.0d

 a,b,c,d,e Letters obtained by Tukey test. Means followed by same letter in the column do not differ at 5% probability

F1: chocolate sample composed by 90% of milk chocolate + 10% of bittersweet chocolate

F2: chocolate sample composed by 70% of milk chocolate + 30% of bittersweet chocolate

F3: chocolate sample composed by 50% of milk chocolate + 50% of bittersweet chocolate

F4: chocolate sample composed by 30% of milk chocolate + 70% of bittersweet chocolate

F5: chocolate sample composed by 10% of milk chocolate + 90% of bittersweet chocolate

The evaluators were able to detect significant differences between all five samples for two attributes (brown color and residual bitterness) in ODP-9. In ODP-15 this occurred for four attributes (brown color, cocoa flavor, sweetness and residual bitterness), while for the other attributes, the samples were separated into four distinct groups (minimum number of groups observed for this technique). In ODP-9 the presence of attributes with less discrimination was observed. For spreadability the samples were separated into three distinct groups and for adhesivity into only two.

In CP-9, the samples were discriminated into four groups for three attributes (brown color, cocoa aroma and residual bitterness) and the formation of five groups was not observed. For the five other attributes (more than half), the samples were separated into only three groups. Differently from CP-9, in the CP-15 five groups were formed for three attributes (brown color, cocoa aroma, cocoa flavor). With regards to adhesivity, the samples were discriminated into four groups and for the others only three groups were formed.

Thus, discrimination tended to increase when the 15 cm scale was used in both methods. Compared to the ODP-9 technique, the ODP-15 resulted in an increase of one discrimination group for three attributes (cocoa flavor, sweetness and spreadability) and of two groups for one attribute (adhesivity). In CP a similar behavior was verified, with an increase of also one group for three attributes (brown color, cocoa aroma and adhesivity) and of two groups for one attribute (cocoa flavor). Thus, the effect of scale length on sample discrimination was the same for ODP and CP.

Repeatability of the results

In ODP-9, the evaluators presented repeatability (p > 0.1) for all sensory attributes, while in ODP-15 there was a significant effect of repetitions (p < 0.1) for two attributes, brown color and spreadability (Table 3).

Table 3.

Repeatability (eij) of descriptive techniques

ODP-9 ODP-15 CP-9 CP-15
p-value
Attributes eij eij eij eij
Brown color 0.199ns 0.034* 0.040* 0.012*
Cocoa aroma 0.866ns 0.257ns 0.037* 0.012*
Cocoa flavor 0.211ns 0.591ns 0.184ns 0.050*
Sweetness 0.114ns 0.762ns 0.248ns 0.156ns
Residual bitterness 0.260ns 0.621ns 0.152ns 0.144ns
Hardness 0.466ns 0.637ns 0.607ns 0.384ns
Spreadability 0.369ns 0.001* 0.184ns 0.923ns
Adhesivity 0.446ns 0.458ns 0.244ns 0.878ns

*p-value significant at 10% probability; ns: not significant at 10% probability

Similar behavior was observed for CP, but slightly less pronounced. In CP-9, two sensory stimuli presented a significant effect of repetition (brown color and cocoa aroma) versus three in CP-15 (brown color, cocoa aroma and flavor) (Table 3).

This result shows a small tendency of evaluators to present greater repeatability on a small scale for both methods. In other words, there was a tendency of evaluators to assign similar scores in different repetitions when the small scale was used.

Frequency of score use on the line scale

For all techniques, independent of the scale length, the evaluators used the entire scale to express the intensity of attributes. However, some ranges were used more frequently than others. The evaluators that performed the ODP-9 technique (Fig. 2a) assigned scores between 0–0.1 and 0.9–1 cm with greater frequency for all attributes. In the ODP-15 technique (Fig. 2b) the evaluators most frequently used scores between 0–0.1 for only three attributes, bitterness, cocoa aroma and flavor. For the latter two the difference in frequency was very low when compared with other ranges. For the other attributes there was not a range that presented an expressive frequency of use, instead, the scores were homogeneously distributed over the scale. In general, it was therefore verified that evaluators tended to use the extremes of the scale less (superior as well as inferior) for all attributes when the evaluation was performed on the larger scale.

Fig. 2.

Fig. 2

Frequency distribution for sensory scores assigned to the descriptive attributes of chocolates for ODP method (a) ODP-9; (b) ODP-15

A similar result was observed for the CP (Fig. 3). In the CP-9 technique (Fig. 3a), the ranges 0–0.1 and 0.9–1 were used with greatest frequency for all descriptive terms. This fact was not always observed in the CP-15 technique (Fig. 3b), as can be verified for brown color, cocoa aroma, spreadability and hardness. For the other sensory characteristics in CP-15, although the extremes of the scale were used more in comparison with the other ranges, the use of that extremes were more pronounced in the 9 cm scale (Fig. 3a) for the respective attributes.

Fig. 3.

Fig. 3

Frequency distribution for sensory scores assigned to the descriptive attributes of chocolates for CP method (a) CP-9; (b) CP-15

Discussion

The results showed that some criteria of the descriptive evaluation were influenced when the same number of samples (five) was evaluated by different scale lengths (9 and 15 cm). These findings are important in the context where different studies adopted different scale lengths in the evaluation of the same number of samples (Wszelaki et al. 2005; Lee and Vickers 2010). For ODP and CP, the interaction was smaller on the larger scale. In sensory analysis, one of the reasons for interaction is the inversion of sensory stimuli perception (Silva and Damásio 1994). The small space for locating the samples on the 9 cm scale could lead to a more confusable scaling contributing to more inversion of sensory stimuli by evaluators. This hypothesis is reinforced by Jeon et al. (2004) that stated that on smaller scales the evaluators have a greater tendency to assign equal scores, or even to invert them when evaluating samples with different intensities of a given stimulus. The inversion of stimulus generate a “disturbance”, a source of variation in the system, caused by the fact that evaluators generate particular numerical responses, spacing their scores differently on the scale (Park et al. 2004), contributing to increase the interaction when the 9 cm scale was used.

In the CP, the decrease in the number of attributes that had a significant effect of interaction on the 15 cm scale was more pronounced than in the ODP. Silva et al. (2012), when comparing the ODP and CP methodologies using a 9 cm line scale, verified a higher tendency of the CP to generate more attributes with significant interactions between samples × evaluators, which agrees with the results obtained in the present study for the 9 cm scale. This observation may be a consequence of the differences between the evaluation protocols of the two methodologies. In the simultaneous protocol (ODP), the evaluators can re-taste and review the scores given to the different samples, which reduces the effect of “forgetting” (a fact observed in the monadic protocol-CP), resulting in a smaller number of errors and consequently less interaction (Jeon et al. 2004; Park et al. 2004).

However, this trend was not observed for the 15 cm scale. No representative difference of interaction effect was indicated between ODP-15 and CP-15. The fact that evaluators have a lower tendency to invert scores on a larger scale, may have contributed to a smaller interaction on the 15 cm scale so that it was not influenced by the method used. Therefore, the fact that the CP, when compared to the ODP, presented greater interaction on the 9 cm scale and did not present an expressive difference on the 15 cm scale (with lower interaction effect), contributed to a more pronounced influence of scale on interaction in this method.

The discriminative capacity was improved when samples were evaluated on the 15 cm scale. According to Pecore et al. (2015), on scales with a limited extent the evaluators are unable to express small differences in intensity between samples. In the present study, when assessing the samples on the 15 cm scale, the evaluators had more available space to indicate the intensity of the attributes, which may have given them a greater chance of adequately representing the “spacing” between the different samples, so that they could express small differences. Furthermore, a reduction of the discriminative capacity observed on the 9 cm scale may also be related to the greater effect of interaction between samples and evaluators, since the variation introduced into the system, caused by the fact that evaluators attribute scores to the samples in a particular way, may imply a reduction in the discriminating power of the samples (Park et al. 2007).

For repeatability of the results, a trend was observed that was slightly greater for the 9 cm scale. The larger the scale the greater the number of possible locations that the evaluators can use to represent the attribute intensity, making it difficult to memorize the markings in the different repetitions. According to Meilgaard et al. (2006), on larger scales the evaluators have more difficulty remembering the position indicated in the different assessments, which may have contributed to diminishing the ability of the evaluators to assign similar scores in different repetitions of the same sample.

The influence of scale length on repeatability was slightly more pronounced in the ODP. Because there was no training in the ODP, the panel that used the 15 cm scale may have felt the effect of scale size when compared with the panel that assessed the 9 cm scale. In the CP, the fact that the panels were trained for using the scales permitted them to adapt their use, which may have contributed to balance responses between the panels that performed evaluations on the 9 and 15 cm scales, resulting in small differences in repeatability.

When studying the usage frequency of scores on the unstructured scale, for all techniques it was observed that the evaluators tended to use the full length of the scale to express the intensity of the attributes. This observation may be a consequence of the “response range equalizing bias” theory, which states that for a given set of stimuli, evaluators tend to distribute their responses across the entire extension of the scale, independent of its length (Poulton 1973). However, some ranges were used more frequently than others. The evaluators who used the 15 cm scale for both methods tended to use the extremes of the scale less when compared to the panels that used the 9 cm scale, showing that they tended to distribute their rating more evenly along the length of the scale. The reduced utilization of the extreme ranges on the larger scale may reflect a lesser need of the evaluators to use its entire length to represent differences in intensity between the samples, i.e., to express the “correct spacing” between them. On the other hand, greater utilization of the extremes on the 9 cm scale may have shown the need for more space so that the evaluators could separate all samples, thus more frequently positioning extreme chocolate samples at the ends of the scale.

Conclusion

The studied variables of descriptive sensory evaluation were influenced by the scale length. This influence had a similar behavior in the two methods, differing in sensitivity regarding sample × evaluator interaction which effect was more pronounced in the CP method. For the conditions studied, the 15 cm line scale is more advantageous because it provides greater discrimination and decreased the interaction effect. Despite of repeatability tended to be better on the 9 cm scale, this variable was the least influenced by scale length.

Acknowledgments

The authors would like to acknowledge the National Research Council—CNPq for their financial support.

Footnotes

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Barbin D. Componentes de variância: teoria e aplicações. 2. Piracicaba: FEALQ; 1993. [Google Scholar]
  2. Blancher G, Chollet S, Kesteloot R, et al. French and Vietnamese: how do they describe texture characteristics of the same food? A case study with jellies. Food Qual Prefer. 2007;18:560–575. doi: 10.1016/j.foodqual.2006.07.006. [DOI] [Google Scholar]
  3. Brannan RG. Effect of grape seed extract on descriptive sensory analysis of ground chicken during refrigerated storage. Meat Sci. 2009;81:589–595. doi: 10.1016/j.meatsci.2008.10.014. [DOI] [PubMed] [Google Scholar]
  4. Carlin A, Kempthorne O, Gordon J. Some aspects of numerical scoring in subjective evaluation of foods. J Food Sci. 1956;21:273–281. doi: 10.1111/j.1365-2621.1956.tb16921.x. [DOI] [Google Scholar]
  5. Castilhos MB, Del Bianchi V, Gómez-Alonso S, et al. Sensory descriptive and comprehensive GC-MS as suitable tools to characterize the effects of alternative winemaking procedures on wine aroma. Part II: BRS Rúbea and BRS Cora. Food Chem. 2020;311:126025. doi: 10.1016/j.foodchem.2019.126025. [DOI] [PubMed] [Google Scholar]
  6. Dairou V, Sieffermann J-M. A comparison of 14 jams characterized by conventional profile and a quick original method, the flash profile. J Food Sci. 2002;67:826–834. doi: 10.1111/j.1365-2621.2002.tb10685.x. [DOI] [Google Scholar]
  7. Damasio M, Costell E. Análisis sensorial descriptivo: generación de descriptores y selección de catadores. Rev Agroquímica Tecnol Aliment. 1991;31:165–178. [Google Scholar]
  8. Gamba MM, Lima Filho T, Della Lucia SM et al (2020) Performance of different scales in the hedonic threshold methodology. J Sens Stud:1–15. 10.1111/joss.12592
  9. Ginés R, Valdimarsdottir T, Sveinsdottir K, Thorarensen H. Effects of rearing temperature and strain on sensory characteristics, texture, colour and fat of Arctic charr (Salvelinus alpinus) Food Qual Prefer. 2004;15:177–185. doi: 10.1016/S0950-3293(03)00056-9. [DOI] [Google Scholar]
  10. Hong JH, Duncan SE, Dietrich AM. Effect of copper speciation at different pH on temporal sensory attributes of copper. Food Qual Prefer. 2010;21:132–139. doi: 10.1016/j.foodqual.2009.08.010. [DOI] [Google Scholar]
  11. Jeon S-Y, O’Mahony M, Kim K. A comparison of category and line scales under various experimental protocols. J Sens Stud. 2004;19:49–66. doi: 10.1111/j.1745-459X.2004.tb00135.x. [DOI] [Google Scholar]
  12. Jeyaprakash S, Heffernan J, Driscoll R, Frank D. Impact of drying technologies on tomato flavor composition and sensory quality. LWT—Food Sci Technol. 2020;120:108888. doi: 10.1016/j.lwt.2019.108888. [DOI] [Google Scholar]
  13. Lawless H, Malone G. The discriminative efficiency of common scaling methods. J Sens Stud. 1986;1:85–98. doi: 10.1111/j.1745-459X.1986.tb00160.x. [DOI] [Google Scholar]
  14. Lee CA, Vickers ZM. Discrimination among astringent samples is affected by choice of palate cleanser. Food Qual Prefer. 2010;21:93–99. doi: 10.1016/j.foodqual.2009.08.003. [DOI] [Google Scholar]
  15. Meilgaard M, Civille G, Carr B (2006) Sensory evaluation techniques, 4th edn. CRC Press
  16. Mielby LH, Hopfer H, Jensen S, et al. Comparison of descriptive analysis, projective mapping and sorting performed on pictures of fruit and vegetable mixes. Food Qual Prefer. 2014;35:86–94. doi: 10.1016/j.foodqual.2014.02.006. [DOI] [Google Scholar]
  17. Minim VPR, Silva RCSN. Análise Sensorial Descritiva. 1. Viçosa-MG: Editora UFV; 2016. [Google Scholar]
  18. Minim VPR, Silva MA, Cecchi HM. Perfil sensorial de ovos de Páscoa. Ciência e Tecnol Aliment. 2000;20:47–50. doi: 10.1590/S0101-20612000000100010. [DOI] [Google Scholar]
  19. Murray J, Delahunty C, Baxter I. Descriptive sensory analysis: past, present and future. Food Res Int. 2001;34:461–471. doi: 10.1016/S0963-9969(01)00070-9. [DOI] [Google Scholar]
  20. Park J-Y, Jeon S-Y, O’Mahony M, Kim K-O. Induction of scaling errors. J Sens Stud. 2004;19:261–271. doi: 10.1111/j.1745-459X.2004.tb00147.x. [DOI] [Google Scholar]
  21. Park JY, O’Mahony M, Kim KO. “Different-stimulus” scaling errors; effects of scale length. Food Qual Prefer. 2007;18:362–368. doi: 10.1016/j.foodqual.2006.03.021. [DOI] [Google Scholar]
  22. Pecore S, Kamerud J, Holschuh N. Ranked-scaling: a new descriptive panel approach for rating small differences when using anchored intensity scales. Food Qual Prefer. 2015;40:376–380. doi: 10.1016/j.foodqual.2014.02.002. [DOI] [Google Scholar]
  23. Picouet PA, Gou P, Pruneri V, et al. Implementation of a quality by design approach in the potato chips frying process. J Food Eng. 2019;260:22–29. doi: 10.1016/j.jfoodeng.2019.04.013. [DOI] [Google Scholar]
  24. Poulton E. Unwanted range effects from using within-subject experimental designs. Psychol Bull. 1973;80:113–121. doi: 10.1037/h0034731. [DOI] [Google Scholar]
  25. Purdy JM, Armstrong G, McIlveen H. Three scaling methods for consumer rating of salt intensity. J Sens Stud. 2002;17:263–274. doi: 10.1111/j.1745-459X.2002.tb00347.x. [DOI] [Google Scholar]
  26. Shand P, Hawrysh Z, Hardin R, Jeremiah L. Descriptive sensory assessment of beef steaks by category scaling, line scaling and magnitude estimation. J Food Sci. 1985;50:495–500. doi: 10.1111/j.1365-2621.1985.tb13435.x. [DOI] [Google Scholar]
  27. Sharma M, Kristo E, Corredig M, Duizer L. Effect of hydrocolloid type on texture of pureed carrots: rheological and sensory measures. Food Hydrocoll. 2017;63:478–487. doi: 10.1016/j.foodhyd.2016.09.040. [DOI] [Google Scholar]
  28. Silva MP, Damásio M (1994) Análise sensorial descritiva. Fundação Tropical de Pesquisas e Tecnologia "André Tosello", Campinas
  29. Silva RCSN, Minim VPR, Simiqueli AA, et al. Optimized descriptive profile: a rapid methodology for sensory description. Food Qual Prefer. 2012;24:190–200. doi: 10.1016/j.foodqual.2011.10.014. [DOI] [Google Scholar]
  30. Silva AN, Silva RCSN, Ferreira MAM, et al. Performance of hedonic scales in sensory acceptability of strawberry yogurt. Food Qual Prefer. 2013;30:9–21. doi: 10.1016/j.foodqual.2013.04.001. [DOI] [Google Scholar]
  31. Silva RCSN, Minim VPR, Carneiro JD, et al. Quantitative sensory description using the optimized descriptive profile: comparison with conventional and alternative methods for evaluation of chocolate. Food Qual Prefer. 2013;30:169–179. doi: 10.1016/j.foodqual.2013.05.011. [DOI] [Google Scholar]
  32. Silva RCSN, Minim VPR, Silva AN, et al. Optimized descriptive profile: how many judges are necessary? Food Qual Prefer. 2014;36:3–11. doi: 10.1016/j.foodqual.2014.02.011. [DOI] [Google Scholar]
  33. Silva RCSN, Minim VPR, Silva AN, et al. Validation of optimized descriptive profile (ODP) technique: accuracy, precision and robustness. Food Res Int. 2014;66:445–453. doi: 10.1016/j.foodres.2014.10.015. [DOI] [Google Scholar]
  34. Simiqueli AA, Minim VPR, Silva RCSN, et al. How many assessors are necessary for the optimized descriptive profile when associated with training? Food Qual Prefer. 2015;44:62–69. doi: 10.1016/j.foodqual.2015.03.019. [DOI] [Google Scholar]
  35. Stone H, Sidel J. Sensory evaluation practices. 3. New York: Academic Press; 2004. [Google Scholar]
  36. Varela P, Ares G. Sensory profiling, the blurred line between sensory and consumer science. A review of novel methods for product characterization. Food Res Int. 2012;48:893–908. doi: 10.1016/j.foodres.2012.06.037. [DOI] [Google Scholar]
  37. Wszelaki AL, Delwiche JF, Walker SD, et al. Consumer liking and descriptive analysis of six varieties of organically grown edamame-type soybean. Food Qual Prefer. 2005;16:651–658. doi: 10.1016/j.foodqual.2005.02.001. [DOI] [Google Scholar]

Articles from Journal of Food Science and Technology are provided here courtesy of Springer

RESOURCES