Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Jun 1;16(6):e0251909. doi: 10.1371/journal.pone.0251909

Development and validation of the facial scale (FaceSed) to evaluate sedation in horses

Alice Rodrigues de Oliveira 1,#, Miguel Gozalo-Marcilla 2,#, Simone Katja Ringer 3, Stijn Schauvliege 4, Mariana Werneck Fonseca 1,#, Pedro Henrique Esteves Trindade 1,#, José Nicolau Prospero Puoli Filho 1, Stelio Pacca Loureiro Luna 1,*,#
Editor: Chang-Qing Gao5
PMCID: PMC8168851  PMID: 34061878

Abstract

Although facial characteristics are used to estimate horse sedation, there are no studies measuring their reliability and validity. This randomised controlled, prospective, horizontal study aimed to validate a facial sedation scale for horses (FaceSed). Seven horses received detomidine infusion i.v. in low or high doses/rates alone (DL 2.5 μg/kg+6.25 μg/kg/h; DH 5 μg/kg+12.5 μg/kg/h) or combined with methadone (DLM and DHM, 0.2 mg/kg+0.05 mg/kg/h) for 120 min, or acepromazine boli i.v. in low (ACPL 0.02 mg/kg) or high doses (ACPH 0.09 mg/kg). Horses’ faces were photographed at i) baseline, ii) peak, iii) intermediate, and iv) end of sedation. After randomisation of moments and treatments, photos were sent to four evaluators to assess the FaceSed items (ear position, orbital opening, relaxation of the lower and upper lip) twice, within a one-month interval. The intraclass correlation coefficient of intra- and interobserver reliability of FaceSed scores were good to very good (0.74–0.94) and moderate to very good (0.57–0.87), respectively. Criterion validity based on Spearman correlation between the FaceSed versus the numerical rating scale and head height above the ground were 0.92 and -0.75, respectively. All items and the FaceSed total score showed responsiveness (construct validity). According to the principal component analysis all FaceSed items had load factors >0.50 at the first dimension. The high internal consistency (Cronbach´s α = 0.83) indicated good intercorrelation among items. Item-total Spearman correlation was adequate (rho 0.3–0.73), indicating homogeneity of the scale. All items showed sensitivity (0.82–0.97) to detect sedation, however only orbital opening (0.79) and upper lip relaxation (0.82) were specific to detect absence of sedation. The limitations were that the facial expression was performed using photos, which do not represent the facial movement and the horses were docile, which may have reduced specificity. The FaceSed is a valid and reliable tool to assess tranquilisation and sedation in horses.

Introduction

Sedation and tranquilisation for procedures in standing horses are alternatives to general anaesthesia [1] to reduce the high anaesthesia-related mortality (0.9%) in these species [2, 3]. Frequently used for premedication before general anaesthesia or longer procedures in standing position, the α-2 adrenergic receptor agonists are widely used and play an important role for chemical restraint in horses. They include xylazine, romifidine, detomidine, medetomidine, and dexmedetomine [48] and when combined with opioids, they produce a synergic analgesic effect [9], and minimise the opioid-induced excitement and intestinal hypomotility in horses [10, 11]. Acepromazine does not provide analgesia but is an option to provide mild to moderate tranquilisation in horses [10].

Different scales have been developed using scoring systems to qualify and quantify sedation in horses under the effect of sedative and tranquilisers [12]. The only objective measurement that does not require interpretation of the observer is head height above the ground (HHAG), first used in 1991 [13]. The HHAG is usually combined with other instruments to assess depth of sedation [12] and it is measured either by the distance in centimetres between the head and the floor [13] or the percentage of the head height compared to pre-treatment value [14]. Studies consider that an HHAG value below 50%, compared to the pre-treatment value, represents sufficient sedation [4, 5, 14]. The few divergences in the literature regarding the anatomical location for HHAG measurement; i.e., nostrils [14], chin [15, 16], or lower lips [17], may affect data reproducibility among different studies. Another limitation of the HHAG is that, although lowering the head is a typical and dose-dependent effect to assess α2 agonist-induced sedation, this method may not be applicable to assess the effects of other drugs, such as opioids and acepromazine.

Other methods to assess depth and quality of sedation are subjective because their interpretation is based on the experience of the observer with the effects of sedation in horses. They include the simple descriptive scale (SDS) [18], composite numerical rating scale (NRS) [19], and visual analogue scale (VAS) [20]. The visual analogue scale is represented by a line ranging from 0 (no sedation) to 10 cm (the most intense sedation possible) [1, 4, 14, 2022]. It tends to correlate positively with the numerical rating scale comprised by the ordinal numbers from 0 to 10 [21, 23]. The composite numerical rating scale incorporates descriptions for different sedation intensities within each proposed item, in which the evaluator must assign one of the descriptions to each item [12, 13]. The simple descriptive scale consists of 0—no sedation, 1—mild, 2 –moderate, and 3—marked sedation, which should be chosen by the appraiser [12, 24]. Unidimensional scales (VAS, NRS and SDS) may be biased to the interpretation and experience of the evaluator, generating differences in results with doubtful representativeness when comparing studies [25].

The need for horse handling is another limitation of some instruments used to assess sedation. The HHAG, visual analogue scale, and numerical rating scale were adapted for both clinical [26, 27] and experimental purposes [11]. Postural instability (ataxia) is usually part of sedation scales [5], as well as threat response and movement of the head or ears in response to i. tactile [11], ii. visual, or iii. auditory stimuli [26]. These stimuli are i. touching the limb coronary band or inside the ears with a blunt object [8, 13, 19], ii. clapping hands [8, 13, 19], metallic sounds [5], blowing a horn or shaking a plastic gallon with stones [28, 29], and iii. shaking a towel or opening an umbrella in front of the horse [5, 8, 13, 19]. These stimuli may be cumbersome, and their delivery may vary according to the handler. The lack of a standard procedure can create confounding factors and, above all, these stimuli disturb the horse and may interfere with the depth of sedation, which could be a problem, notably in clinical situations.

Because facial assessment is based only on observation, the development of a scale based on facial expression would avoid the aforementioned limitations of sedation assessing instruments. The relevance of facial expressions together with behavioural characteristics of different species were brought up by Charles Darwin as a way of expressing emotional states [30]. Facial expression has been widely used to evaluate pain in veterinary medicine, especially in horses [3133], which are prone to changing their facial expression under different circumstances [34]. Facial expression is not a novelty to assess sedation in horses either. It has already been incorporated in simple descriptive [35, 36] and composite numerical scales [26] for this purpose. The expressions involve ear tip distance [35, 37] or movement to stimulation [26, 36], reduced eye alertness [35] or aperture [26], lip aperture [35], atonic lower lip [36], and lip oedema [37]. However the validity of these expressions for identifying sedation has not yet been assessed [26].

Once developed, any instrument requires investigation to either confirm or not its value for measuring a construct. This may be accomplished by a validation process similar to those reported in other studies that validated behavioural pain scales in cats [38], horses [39], cattle [40], pigs [41], and sheep [42] and facial pain scales in cats [43], horses [31, 33, 44], and sheep [45]. For this, the repeatability and reproducibility (intra- and interobserver reliability, respectively) are evaluated as well as validation of the three ‘Cs’, referring to content, criteria, and construct; the last tests the responsiveness of the scale [46]. In addition, the item-total correlation and internal consistency identify the importance of each item of the scale to guarantee its homogeneity [47], and how much the items correlate to each other, respectively [46]. The sensitivity and specificity calculate the percentage of true positives (sedated horses) and negatives (not sedated horses), respectively, and the distribution of scores informs about the proportion of subitems of each category according to the intensity of sedation (i.e., subitems within each category representing the highest scores should be predominant in the deepest sedation status) [41].

Although a recent study validated a behavioural scale to assess sedation in horses [48], it requires handling of the horses. In face of the limitations of the current methods used to qualify and quantify sedation and tranquilisation in horses, the objective of this study was to develop and validate a facial scale (FaceSed) to quantify sedation in horses based on facial characteristics, assessed without horse manipulation.

The data of the present study were collected in two previous simultaneous studies. The first aimed to identify a protocol that provided antinociception without excessive sedation [11] and the second targeted development and validation of a sedation scale based on general behaviour [48]. The head heights above the ground transformed into percentages (HHAG%) were the only previously published data, which were also [11, 48] used in the present study, only for comparison to the new proposed instrument. All other data presented here are original of the current study. The horses were tranquilised and/or sedated with different doses/rates of detomidine alone or associated with methadone (Phase I), predicted to obtain a moderate/deep sedation score and two doses of acepromazine (Phase II) predicted to obtain low and high degree of tranquilisation scores. We hypothesised that the proposed scale, according to the statistical standards of content, criteria, and construct validities, inter- and intraobserver reliability, item-total correlation, internal consistency, sensitivity, and specificity, and the cut-off point determination adequately measures depth of sedation and tranquilisation through facial expression.

Materials and methods

This study was approved by the Ethics Committee on the Use of Animals (CEUA) for research at the School of Veterinary Medicine and Animal Science, São Paulo State University (UNESP), Botucatu, SP, Brazil, under protocol 2017/0051. The data of this study were collected during other previously published studies with different aims: pharmacokinetics [49, 50], antinociceptive effects [11], and development and validation of a behavioral scale for assessing sedation in horses [48] submitted to intravenous infusion of detomidine and methadone. The only duplicity of data presented in this study compared to the previous ones [11, 48] is HHAG% which were collected on-site.

Three geldings and four female quarter horses and appaloosa crossbred from the same herd (9–11 years, 372–450 kg) owned by the Edgardia experimental farm, from the São Paulo State University (UNESP), Botucatu Campus, Brazil, were enrolled in the study. The horses were kept on pasture and fed with hay and commercial feed once a day. All horses were healthy based on normal physical and laboratory tests (blood count and biochemistry: urea, alkaline phosphatase, alanine aminotransferase, and gamma glutamyl transferase). The day before the study the horses were collected and kept in covered facilities with water ad libitum and access to the outside area. Solid fasting was established for two hours before the experiment began. Interventions for each individual horse were always performed at a fixed weekday and time (morning or afternoon) with at least seven days between treatments.

The study was divided into two phases performed with the same horses, subjected to all treatments. The interval between Phases I and II was eleven months. Except for HHAG%, Phase I data are exclusive for the present study, but were collected as part of other simultaneously performed studies when the horses were sedated with constant rate infusions (CRIs) of detomidine alone or associated with methadone [11, 4850]. The experimental number of horses was based on this previous simultaneous study [11], where sample size was estimated at n = 7, according to a pilot study based on HHAG and mechanical, thermal, and electrical nociceptive stimuli results [51] (α = 0.05, β = 0.80) [11]. Moreover, the sample size was corroborated with articles of similar methodologies [9, 52]. In the second phase (Phase II), the same horses were subjected to two boli of acepromazine (see flowchart in S1 Appendix).

Phase I

Phase I was conducted simultaneously with the previous studies [11, 48, 50]. Detomidine was chosen because it is the most commonly used α-2 adrenergic receptor agonist for continuous infusion, there is available pharmacokinetic published data, is used worldwide and because previous studies have demonstrated the effect of detomidine and methadone combination [9, 50, 52].

After weighing and directing the horses to the six-square metres experimental room, the left jugular vein area was clipped, asepsis performed and a 14-gauge catheter [G14 x 70 mm—Delta Med Srl, Italy] was inserted and fixed for drug administration. The horses were then placed in the restraining stocks inside the experimental stall. The HHAG, which is an objective parameter used for the original study [11], in a recently published behavioural sedation scale study [48], and in the present study (FaceSed validity), was measured on-site in cm without disturbing the horse, using a scale attached to the wall 1.5 m aside. For convenience, HHAG measurements were transformed into percentages, considering the baseline as 100% of the HHAG as in other reports [14]. The baseline HHAG was measured in centimetres when the horses were not sedated and positioned on the stocks, just before administration of the treatments. This value was considered as 100% of the HHAG and used for comparison against the other time-points. Afterwards, the face of the horse was photographed from the lateral and oblique craniocaudal positions to take images of the baseline moment. Subsequently, tactile, auditory, and visual stimuli were performed to evaluate the depth of sedation (data published previously [11]).

Once baseline measurements were taken, one of the following intravenous (i.v.) treatments (bolus + CRI for 120 min) was administered in a random manner (for the original crossover study) [11], by one of the evaluators (M.G.M) unaware of treatment. The treatments were previously numbered and randomised using a website [53] for each horse, and the sequence registered by another author responsible for the CRIs (M.W.F). The treatments were DL—low detomidine dose [Eqdomin, 10 mg/ml—Ourofino Saúde Animal, São Paulo, Brazil] (2.5 μg/kg followed by 6.25 μg/kg/h CRI), DH—high detomidine dose (5 μg/kg followed by 12.5 μg/kg/h CRI), DLM—low detomidine dose with methadone [Mytedom 10 mg/ml—Cristália Produtos Químicos e Farmacêuticos Ltda, São Paulo, Brazil] (2.5 μg/kg of detomidine + 0.2 mg/kg of methadone followed by detomidine 6.25 μg/kg/h + methadone 0.05 mg/kg/h CRIs), and DHM—high detomidine dose with methadone (5 μg/kg of detomidine + 0.2 mg/kg of methadone followed by detomidine 12.5 μg/kg/h + methadone 0.05 mg/kg/h CRIs). The 120-min CRIs were administered after the drug bolus using two syringe drivers [DigiPump SR8x—Digicare Biomedical Techology Inc, Florida, USA, and Pilot Anaesthesia—Fresenius Vial, Brezins, France], one for each drug. The horses were kept in the stocks for four hours for blood sampling for previously published studies performed simultaneously [11, 49, 50]. Apart from the baseline time-point, HHAG and sedation evaluations and photographic records were performed on-site 5, 15, 30, 60, 90, 120, 150, 180, 210, and 240 min after initial bolus administration of each treatment. These time-points were selected according to the previous studies from our group using a similar methodology [9, 52], and based on the judgment that the effect of detomidine and methadone would be abated 120 minutes after the end of infusion [49, 50]. For the present study only photographic records were analysed. HHAG was transformed in percentage from data already reported in a previous study performed simultaneously [11].

Phase II

Phase II was performed for this study and for another recently published study that developed and validated a behavioural sedation scale in horses [48]. The only duplicity of data from that study is the HHAG%. Eleven months after the phase I, new physical and laboratory evaluations certified the good health of the same horses. An i.v. bolus of acepromazine [Acepran 1%—Vetnil Indústria e Comércio de Produtos Veterinários Ltda, Louveira, São Paulo, Brazil] was injected at low (ACPL 0.02 mg/kg) or high dose (ACPH 0.09 mg/kg) and the photos of the face of the horses were registered on-site at the time-points 5, 15, 30, 60, 90, 120, as indicated for phase I, but only up to 120 min, because during this phase there was no other concomitant study, therefore, unlike in Phase I, it was not necessary to maintain the horses in the stall for longer than 120 minutes, because they would be restless.

Selection of photos and evaluations

The representative moments of sedation were selected for Phases I and II based on the reactions to tactile, auditory, and visual stimuli on-site [11], as the animals’ faces were not evaluated on-site. For the present study, a total of 168 moments were selected for photographic evaluation (7 horses x 6 treatments x 4 representative moments). With the 7 horses and the 6 treatments described (four from Phase I and two from Phase II), the four representative moments of the different sedation intensities were the following: i) baseline, ii) peak sedation, iii) intermediate sedation and iv) end of sedation.

For Phase I, i) the baseline moment occurred before the administration of the drug(s), ii) the peak of sedation was considered 120 min after the bolus administration, immediately before the CRI(s) ended. This moment was selected according to the parallel pharmacokinetic study [49, 50], that showed high detomidine plasma concentrations, corroborated by the high sedation scores obtained in the simultaneous parallel study, based on the tactile, auditory, and visual stimuli performed to evaluate the depth of sedation on-site and published elsewhere [11]. iii) Intermediate sedation occurred 30 min after peak sedation, 150 min after bolus administration (30 min after discontinuing drug infusion) and iv) the end of sedation was 240 min after bolus administration, characterised by low sedation scores according to the tactile, auditory, and visual stimuli recorded on-site, close to the baseline [11], and residual plasma drug concentrations [49].

For Phase II with boli of acepromazine, i) baseline moment occurred before the administration of the drug, ii) the peak of sedation was considered 60 min after the administration of acepromazine based on the results of the degree of sedation according to the tactile, auditory, and visual stimuli recorded on-site in the simultaneous parallel study [48] iii) Intermediate sedation occurred at 90 min when the score of tranquillisation registered on-site was moderate and iv) the end/reduction of tranquillisation when the on-site sedation score was low at 120 min after acepromazine administration. The final time-point of 120 min was chosen for convenience, to avoid the restriction of the horses in the stall for longer periods. The sum of the scores of the degree of sedation according to the tactile, auditory, and visual stimuli recorded at the experimental moment only for Phase II, was not publish elsewhere. Each stimulus was scored from 0 to 3, where zero represented no sedation and 3 the deepest sedation.

Four evaluators, experienced with sedation in horses, who did not communicate with each other observed and scored the two photos (lateral and oblique cranio-caudal) of the face of the horse at each moment. These were the responsible researcher (A.R.O.–Evaluator 1, E1) and three from different institutions holding a Diploma by the European College of Veterinary Anaesthesia and Analgesia [M.G.M.–Evaluator 2, E2 (since 2014); S.K.R.–Evaluator 3, E3 (since 2009); and S.S.–Evaluator 4, E4 (since 2009)]. All the evaluators were aware of the goal of the coding. The FaceSed was not analysed or scored on-site. Evaluators 1 and 2 were present on-site during the experiment and evaluator 1 was responsible for selecting the photos. Although this evaluator cannot be considered completely blinded, she was not aware of the treatment and moments, and it would be almost impossible in practical terms to memorise the 168 selected photos.

Prior to the start of the analysis, the evaluators were trained by evaluating 16 pairs of photos randomly chosen (10% of the total number of photos evaluated = 168). The pairs of photos were selected from four horses at each moment of each detomidine treatment by the main author—evaluator 1. The pairs of photos were randomised not only for the moments of each animal in sequence, but for the animals, time-points, and treatments. The photos were evaluated twice independently by each evaluator with an interval of one month. The randomisations were performed twice for the two observations within a one-month interval. After confirming high (≥ 80%) intra (comparison between the first and second evaluation for each observer) and interobserver (matrix Spearman correlation comparison among observers at the second evaluation) correlations [46] regarding the attribution of scores, the main evaluations were started. This analysis was performed to guarantee that the evaluators would be reliable and genuinely involved in the study.

The 168 moments (pair of photos—lateral and oblique cranio-caudal) were randomised in the same way as in training. In this manner the evaluators scored the photos in two stages, each with different randomisation, at least one month apart. Evaluation spreadsheets and guidelines (S2 Appendix) for completing the data were made available. Evaluators were asked not to work on the project for more than one hour a day to avoid fatigue and to score first the numerical rating scale from 0 (not sedated) to 10 (maximum sedation) [39], followed by the FaceSed (S2 Appendix).

Development of the FaceSed scale

The initial development of the FaceSed was adapted and modified from three out of six facial action units in horses undergoing pain compared to pain-free horses described in the horse grimace scale [31]. Only orbital aperture was coincident in the Grimace and FaceSed. The grimace scale description of the eye tightening corresponds to the eyelid partially or completely closed in the FaceSed. Strained mouth and lower lip with a pronounced chin from the grimace scale were modified to relaxation of the upper and lower lip. Stiff backwards ears from HGS were adapted to opening between the ear tips. In the present study, the grimace scale descriptions were adapted for the FaceSed to expressions indicative of muscle relaxation, characterised by the inability to sustain and move the facial action units. Other facial units from the grimace scale were excluded. After this first step comparing the grimace scale and FaceSed, other studies were assessed to find other possibly useful facial units that would call attention and might be easily identified by a human [26, 32, 33, 35, 36, 54]. The movements in the facial units of eyes, ears, and lips were identified according to EquiFACS [54]. However because none of the authors was EquiFACS certified, it was only used to identify possible indicators of movement of the facial musculature, which would resemble easily identifiable facial units described in other studies of horses under pain [32, 33] and sedation [26, 35, 36]. Other studies developing facial pain scales in horses suffering colic or noxious stimuli [32, 33] were also considered for the evaluation of the eyes, ears, and lips. Evaluations for the eyes [26, 36], ears [26], and lips [35, 36] have also been explored in studies that investigated the effect of different sedation protocols. Based on these observations, the FaceSed was developed with three scores for each item. Descriptors were based on the expected expression of muscle relaxation of each facial action unit for absent, partially sedated, or obviously sedated.

The content validation was performed to identify if the descriptions of the items were clear and relevant according to the theme of the scale. Content validation was achieved in three steps: i) the use of facial characteristics already described in studies with sedation, stress, and pain in horses [3133, 54, 55], ii) the analysis of the semantic clarity of the content of the scale by the four principal evaluators before starting the training and main evaluation, and iii) the analysis of the relevance, performed by the three external experienced veterinary anaesthesiologists (M.O.T., F.A.O. and C.L.), not involved in the study, who attributed to the importance of each item on the scale as relevant +1, did not know how to give an opinion 0, or irrelevant -1. The relevance value attributed by each veterinary anaesthesiologist was summed and divided by three. The range of each score is -1 to +1, and values greater than 0.5 were considered relevant [56].

After the content was validated, the 168 photos were analysed twice by each of the four main evaluators and the analysis described below was performed. Only HHAG% data were collected on-site and published before [11, 48].

Statistical analyses

Reliability

The reliability of the numerical rating scale and individual FaceSed items was calculated using the weighted kappa coefficient (Kw) and the FaceSed sum by the intraclass correlation coefficient (ICC) of the agreement type (Confidence interval). For repeatability, data from the first evaluation was compared with the second evaluation for each evaluator. For reproducibility, the scores of the first and second evaluations of all evaluators were compared using an agreement matrix. Weighted kappa and ICC were interpreted as very good (0.81–1.0), good (0.61–0.80), moderate (0.41–0.60), reasonable (0.21) 0.40), or poor (< 0.2) [46, 57].

The following analyses were performed with data from all evaluators, treatments, and grouped moments. The exceptions are described below.

Concurrent criteria validation

The most common way to test concurrent criteria validation is to correlate the proposal instrument with a gold-standard one aiming to measure the same purpose [46]. However, because when the study was performed there was no validated or gold standard scales to evaluate sedation in horses, the numerical rating scale was used, as it is a simple and intuitive scale and the HHAG% was used because is an objective measurement often used for several studies assessing sedation in horses. For this analysis treatments were grouped according to their similarities in sedation intensity, to evaluate any differences between: i) tranquilisation (ACPL + ACPH) and ii) sedation with low (DL + DLM) and iii) or high detomidine doses (DH + DHM). To test the validity of the concurrent criterion, the Spearman correlation was measured between the FaceSed with numerical rating scale and HHAG%, with the interpretation 0–35—low correlation; 0.35–0.7—mean correlation; 0.7–1.0—high correlation [46, 56, 58].

Construct validity (responsiveness)

The construct validity investigates whether the scale is measuring what it is set out to measure. Therefore, for responsiveness, it is expected that when horses are deeply sedated, their sedation scores should be higher than when sedation is abating or absent [56]. The data did not pass the Shapiro-Wilk normality test for responsiveness. Therefore, a Friedman test was used for all variables to evaluate differences over time (baseline, peak sedation, intermediate, and end of sedation) for each treatment and the Kruskal-Wallis test to compare the treatments within each moment.

Principal component analysis

The principal component analysis was used to define the dimensions of the scale according to the distribution of the items and how they correlate to each other [58]. Its interpretation was based on the Kaiser criterion [59], where eigenvalues > 1 and variance > 20 are approved and items on the scale with load factor ≥ 0.50 or ≤ -0.50 are selected. The eigenvalues and variance are coefficients extracted from the correlation matrix of the principal component analysis that indicate the degree of contribution of each dimension, to select only the representative dimensions [60].

Internal consistency

Another method used to evaluate the intercorrelation among items of the FaceSed was the internal consistency by Cronbach’s α coefficient [61]. Minimally acceptable values are between 0.60–0.64, acceptable 0.65–0.69, good 0.70–0.74, very good 0.75–0.80, and excellent above 0.80 [62].

Item-total correlation

To find out if the items contributed to the total score of the scale in a homogeneous manner, Spearman’s item-total correlation was performed between each item and the total sum of FaceSed after excluding the evaluated item. The value of each item in this analysis is interpreted as the individual relevance of the item compared to the total score of the scale. Acceptable values are between 0.30 and 0.70 [46].

Sensitivity and specificity

This analysis is mainly performed to help identifying the diagnostic accuracy. To calculate the sensitivity and specificity of a new test, it should be compared to a gold standard test to identify the true positives and negatives among what is being measured. Because there is no validated scale to evaluate sedation in horses, the sensitivity and specificity were calculated considering the presence (peak of sedation) or absence (at the baseline) of sedation. In the case of sedation, sensitivity ascertains if the instrument identifies the true positives (high sedation scores compatible with deep sedation) and specificity determines the true negatives (absent sedation scores compatible with non-sedated animals). Sensitivity towards detecting the presence of sedation (regardless of the degree) was calculated by the ratio between horses with scores ≥ 1 at the peak of sedation (considered sedated or true positive) and the total number of horses, and, in a similar way, specificity by the relationship between horses with a score of 0 at baseline (not sedated or true negative) and the total number of horses. The interpretations are considered excellent when 95–100%, good when 85–94.9%, moderate when 70–84.9%, and non-sensitive or specific when <70% [39, 63].

ROC curve for determination of the cut-off point of the FaceSed

The Receiver Operating Characteristic (ROC) curve is the graphical representation of the relationship between sensitivity at the peak of sedation (to detect the truly sedated horses) and specificity at the baseline time-point (to distinguish the truly negative non-sedated horses). The discriminatory capacity of the test is determined by the area under the curve [64]. Only horses treated with high doses of detomidine were used for this analysis. The HHAG% was used in this analysis as a predictive value because it is the most commonly used objective sedation measure in horses. Horses with an HHAG% ≤50% were considered truly sedated, because this is the value used in some studies to consider horses sufficiently sedated for standing procedures [4, 9]. To determine the cut-off point of the scale, the Younden index and the diagnostic uncertainty zone were defined. The Younden index is the coincident score with the highest sensitivity and specificity of the scale according to the ROC curve, and the diagnostic uncertainty zone or grey zone is the diagnostic accuracy when calculating 95% CI by replicating the original ROC curve 1001 times using the bootstrap method and the sensitivity and specificity value > 90%. This interval of the lowest and the highest values of these two methods is the grey zone [46, 65].

Frequency distribution of scores

Finally, the frequency of sedation scores that were assigned by the evaluators was performed in the four evaluation time-points, within each grouped treatment of tranquilisation (ACPL + ACPH) and sedation with low (DL + DLM) and high doses of detomidine (DH + DHM). This analysis was performed to assess the presence or absence of each score attributed to each item in horses under different depths of sedation or in the state of normality, to investigate their representativeness and importance.

The statistical analysis in this manuscript was performed using R software in the RStudio integrated development environment (RStudio Team– 2016) and Microsoft Office® (Excel—2019). The statistical significance was accepted at p<0.05.

Results

The FaceSed showed intra and interobserver reliability, content, criterion and construct validities, homogeneity (item-total correlation), good internal consistency and sensitivity to assess sedation in horses.

Content validity

The semantic clarity of the content of the scale of the characteristics described at FaceSed based on previous studies [31, 33, 44, 54] was approved (Table 1 and Fig 1).

Table 1. Content validity of FaceSed developed to evaluate the degree of sedation in horses.

Area evaluated Relaxation Intensity Scores Relevance (-1, 0, 1) References
Ears [31, 33, 44, 54]
No opening between the ear tips, position of attention 0 0.66
Partial opening between the ear tips or asymmetry 1 0.33
Wide opening between the ear tips (ears relaxed) 2 0.66
Eyes (orbital opening) [31, 33, 44, 54]
Eyes completely opened 0 1
Eyes partially opened 1 1
Eyes almost or completely closed 2 1
Relaxation of the lower lip [33, 44, 54]
No signs of lower lip relaxation and/or closed mouth 0 1
Slight relaxation of lower lip 1 1
Pronounced relaxation of lower lip and/or open mouth 2 1
Relaxation of the upper lip
No signs of upper lip relaxation 0 0.33
Slight upper lip relaxation 1 0
Pronounced upper lip relaxation 2 0.33

FaceSed—Numerical Facial Scale of Sedation in Horses. Values greater than 0.5 are considered relevant.

Fig 1. Facial sedation scale in horses (FaceSed).

Fig 1

The item partial opening between the tips of the ears or asymmetry representative of score 1 presented a mean of less than 0.5 for the degree of relevance (Table 1). All sub-items of the upper lip relaxation had a mean lower than 0.5 by the external assessors, but it was maintained because these characteristics have been described and used in previous studies.

Intraobserver reliability (repeatability)

The repeatability (ICC) of the sum of the FaceSed of all the four observers together ranged from good to very good (0.74–0.94) (Table 2). The repeatability of the numerical rating scale was very good (0.86–0.92).

Table 2. Intra-rater reliability of FaceSed and numerical rating scales between the first and second observations (confidence interval).

FaceSed Items E1 E2 E3 E4
kw CI kw CI kw CI kw CI
Ears 0.8 0.73–0.86 0.71 0.71–0.71 0.56 0.48–0.64 0.71 0.71–0.71
Orbital opening 0.85 0.79–0.91 0.85 0.8–0.9 0.72 0.67–0.77 0.86 0.86–0.86
Lower lip 0.72 0.68–0.76 0.62 0.53–0.7 0.68 0.62–0.75 0.7 0.66–0.74
Upper lip 0.77 0.77–0.77 0.7 0.67–0.73 0.62 0.53–0.71 0.77 0.71–0.82
NRS 0.9 0.9–0.9 0.86 0.86–0.86 0.91 0.91–0.91 0.92 0.92–0.92
ICC CI ICC CI ICC CI ICC CI
FaceSed 0.91 0.89–0.94 0.86 0.74–0.90 0.82 0.78–0.86 0.89 0.85–0.92

FaceSed—Numerical facial scale of sedation in horses. NRS -Numerical rating scale. E1—Evaluator 1, E2—Evaluator 2, E3—Evaluator 3, E4—Evaluator 4. Interpretation of weighted Kappa (Kw) and Intraclass correlation coefficient (ICC)—very good 0.81–1.0; good 0.61–0.80; moderate 0.41–0.60, reasonable 0.21–0.40; and poor < 0.20. Confidence interval (CI) [46, 57, 58].

Interobserver reliability (reproducibility)

The numerical rating scale showed very good agreements (0.83–0.88), while the FaceSed showed moderate to very good agreements (0.57–0.87).

The item orbital opening presented good to very good agreement (0.68–0.88) and the others presented reasonable to good agreement (0.26–0.71) between observers (Table 3).

Table 3. Inter-rater matrix comparison of FaceSed and numerical rating scale scores among all evaluators.

Evaluator E1 E2 E3
kw CI kw CI kw CI
Ears
E2 0.39 0.33–0.46
E3 0.55 0.47–0.62 0.54 0.46–0.62
E4 0.61 0.53–0.69 0.57 0.48–0.65 0.61 0.53–0.68
Orbital opening
E2 0.83 0.79–0.88
E3 0.74 0.68–0.80 0.75 0.71–0.79
E4 0.80 0.75–0.84 0.84 0.82–0.86 0.75 0.71–0.78
Lower lip
E2 0.44 0.26–0.63
E3 0.50 0.42–0.57 0.45 0.37–0.53
E4 0.64 0.59–0.70 0.49 0.40–0.59 0.45 0.37–0.52
Upper lip
E2 0.65 0.59–0.71
E3 0.54 0.47–0.61 0.52 0.45–0.59
E4 0.65 0.58–0.71 0.64 0.59–0.70 0.59 0.52–0.65
NRS
E2 0.83 0.83–0.83
E3 0.85 0.85–0.85 0.83 0.83–0.83
E4 0.88 0.88–0.88 0.83 0.83–0.83 0.83 0.83–0.83
ICC CI ICC CI ICC CI
FaceSed
E2 0.75 0.70–0.79
E3 0.64 0.57–0.70 0.73 0.67–0.77
E4 0.84 0.80–0.87 0.82 0.78–0.85 0.71 0.65–0.76

FaceSed—Numerical facial scale of sedation in horses. NRS—numerical rating scale. E1—Evaluator 1, E2—Evaluator 2, E3—Evaluator 3, E4—Evaluator 4. Interpretation of weighted Kappa (Kw) and Intraclass correlation coefficient (ICC)—very good 0.81–1.0; good 0.61–0.80; moderate 0.41–0.60, reasonable 0.21–0.40; and poor < 0.20. Confidence interval (CI) [46, 57, 58].

Concurrent criteria validation

The correlations of the sum of FaceSed with the numerical rating scale for the animals treated with acepromazine, low and high detomidine doses and all groups together were 0.85, 0.91, 0.95, and 0.92, respectively and for HHAG% were -0.56, -0.77, -0.80, and -0.75, respectively.

Construct validity (responsiveness)

All FaceSed items, their sum, the numerical rating scale and HHAG% (except ACPL) presented higher scores at peak sedation/tranquilisation for all treatments than baseline (Table 4).

Table 4. Responsiveness of FaceSed, numerical rating scale, and HHAG% over time and between treatments.

Moments
FaceSed Baseline Peak of sedation Intermediate End of sedation
Items Median Amplitude Median Amplitude Median Amplitude Median Amplitude
Ears ACPL 0.5bB 0–2 1aB 0–2 1aB 0–2 1abAB 0–2
ACPH 1bA 0–2 1aB 0–2 1aB 0–2 1aA 0–2
DL 1cAB 0–2 2aA 0–2 1bB 0–2 1bcAB 0–2
DLM 0cB 0–2 2aA 0–2 1bB 0–2 1cB 0–2
DH 1bAB 0–2 2aA 0–2 2aA 0–2 1bAB 0–2
DHM 0bB 0–2 2aA 0–2 2aA 0–2 1bB 0–2
All 1d 0–2 2a 0–2 2b 0–2 1c 0–2
Orbital opening ACPL 0b 0–1 1aD 0–2 1aC 0–2 1aB 0–2
ACPH 0b 0–1 1aCD 0–2 1aBD 1–2 1aA 0–2
DL 0c 0–1 2aAB 1–2 1bCD 0–2 0cC 0–1
DLM 0c 0–1 1aBC 0–2 1bC 0–2 0cC 0–1
DH 0b 0–2 2aA 1–2 1.5aAB 1–2 0bC 0–1
DHM 0b 0–1 2aA 0–2 2aA 0–2 0bBC 0–2
All 0d 0–2 2a 0–2 1b 0–2 1c 0–2
Lower Lip ACPL 0b 0–1 1aB 0–2 1aAB 0–2 1aAB 0–2
ACPH 0.5b 0–2 1aB 0–2 1aA 0–2 1aA 0–2
DL 0c 0–2 1aAB 0–2 1bcB 0–2 1bcBC 0–2
DLM 1b 0–2 1aB 0–2 1abB 0–2 0bC 0–2
DH 1c 0–2 2aA 1–2 1bA 0–2 0cC 0–2
DHM 1b 0–2 2aA 0–2 1aA 0–2 0bC 0–1
All 1c 0–2 1a 0–2 1b 0–2 1c 0–2
Upper Lip ACPL 0b 0–1 1aC 0–2 1aB 0–2 0.5a 0–2
ACPH 0c 0–1 1abBC 0–2 1aA 0–2 1bc 0–2
DL 0c 0–2 1aB 0–2 0bcB 0–2 0bc 0–2
DLM 0c 0–1 1aBC 0–2 1abB 0–2 0bc 0–2
DH 0c 0–2 2aA 0–2 1bA 0–2 0c 0–2
DHM 0b 0–1 2aA 0–2 1aA 0–2 0b 0–2
All 0d 0–2 1a 0–2 1b 0–2 0c 0–2
FaceSed ACPL 1c 0–5 4aD 1–8 3abC 1–7 3bAB 0–7
ACPH 2c 0–5 4abD 2–8 5aB 2–8 4bA 1–8
DL 2c 0–5 6aBC 3–8 3bC 0–8 2bcBC 0–5
DLM 1c 0–6 5aCD 1–8 4bC 0–8 2cC 0–6
DH 2c 0–8 7aA 4–8 6bAB 1–8 2cC 0–7
DHM 1b 0–4 7aAB 4–8 6aA 1–8 2bBC 0–5
All 1,5d 0–8 6a 1–8 4b 0–8 2c 0–8
NRS ACPL 2bB 0–5 4aE 2–10 4aC 2–7 4aB 0–7
ACPH 2bA 1–5 6aDE 3–8 5.5aB 3–10 5aA 2–9
DL 2cAB 0–4 7aBC 4–10 4bC 1–10 3cBC 1–6
DLM 2cAB 0–6 6aCD 3–10 4bC 1–10 2cC 0–5
DH 2cAB 0–10 9aA 5–10 6.5bAB 2–10 3cC 1–8
DHM 2bB 0–5 9aAB 4–10 7aA 2–10 3bC 1–6
All 2d 0–10 7a 2–10 5b 1–10 3c 0–9
HHAG%* ACPL 100a 100–100 89abA 27–100 84bAB 81–90 95ab 71–103
ACPH 100a 100–100 66bAB 53–91 82abAB 50–102 92ab 68–105
DL 100a 100–100 53bAB 22–71 93abA 44–105 103a 83–110
DLM 100a 100–100 64bAB 29–89 85abAB 47–95 100a 96–107
DH 100a 100–100 27bB 18–47 60abB 27–80 100a 86–105
DHM 100a 100–100 29bB 14–74 68abAB 26–95 102a 84–147
All 100a 100–100 53b 14–100 82b 26–105 100a 68–147

FaceSed—Numerical facial scale of sedation in horses, NRS—Numerical rating scale, *HHAG%—Head height above the ground (data collected in a previously published study in cm [11, 48] doi:10.1111/evj.13054 and doi:10.3389/fvets.2021.611729 respectivelly), ACPL + ACPH (acepromazine in low and high doses); DL + DLM (low dose detomidine and associated with methadone); DH + DHM (high dose detomidine and associated with methadone). Different lower-case letters represent statistical differences over time (p <0.05) (a> b> c). Different capital letters represent statistical differences between treatments (p <0.05) (A>B>C).

When analysing all grouped treatments, the FaceSed and numerical rating scale scores at the end of sedation were significantly different from baseline (Fig 2). However, when evaluating the treatments alone, it was observed that the scores for the sum of Facesed at the end of sedation were only higher compared to baseline in groups ACPL, ACPH and DHM, while for the numerical rating scale, this was only true in groups ACPL and ACPH (Table 4).

Fig 2. FaceSed (A) and numerical rating scale (B) scores before and after grouped treatments (ACPL, ACPH, DL, DLM, DH, and DHM).

Fig 2

ACPL + ACPH—low and high dose acepromazine, DL + DLM—low dose detomidine and associated with methadone, DH and DHM—high dose detomidine and associated with methadone. Different symbols indicate significant differences between them (p<0.05). † > ‡> § >¶.

When comparing treatments at peak sedation, the FaceSed and numerical rating scale in DL and DLM groups were significantly different from DH and DHM respectively, and the treatments DH and DHM were significantly different from the acepromazine treatments (Table 4).

Principal component analysis

The multiple association analysis by principal components (Table 5) defined the scale as unidimensional, as it presented the largest load factors for each item in the first dimension (> 0.5), with eigenvalue > 1 and variance > 20 [59].

Table 5. Load values, eigenvalues, and variance of FaceSed items by principal component analysis.

FaceSed Items Load factors in Dimension 1 Load factors in Dimension 2
Ears 0.78 0.52
Orbital opening 0.86 0.11
Lower lip relaxation 0.77 -0.51
Upper lip relaxation 0.83 -0.13
Eigenvalue 2.63 0.56
Variance 65.72 14.12

FaceSed—Numerical facial scale of sedation in horses. Values in bold represent eigenvalues > 1, variance > 20, and load factor ≥ 0.50 or ≤ -0.50 approved according to the Kaiser criterion [59].

The items of the scale presented all vectors direction identifying intermediate and deep sedation (Fig 3).

Fig 3. FaceSed principal component analysis biplot.

Fig 3

Confidence ellipses were built according to the moments before and after sedation. Baseline (green); Peak of sedation (red); Intermediate (purple); End of sedation (blue). The ellipses on the left represent the absence or end of sedation and on the right represent the peak or intermediate sedation. The time-points peak and intermediate sedation influence all items on the scale since their vectors are directed to these ellipses.

Item-total correlation and internal consistency

The item-total correlation fell within the values considered acceptable, from 0.3 to 0.7 [46], except for the item eyes (0.73), and the internal consistency was excellent for all items on the scale (Table 6).

Table 6. Spearman item-total correlation and internal consistency of each FaceSed item.

FaceSed Item Item-total Correlation Internal Consistency
All items 0.83
Excluding Ears 0.60 0.80
Excluding Orbital opening 0.73 0.74
Excluding Lower lip relaxation 0.60 0.80
Excluding Upper lip relaxation 0.68 0.77

FaceSed—Numerical facial scale of sedation in horses. Interpretation of Spearman’s item-total correlation (rs)—values between 0.3 and 0.7 are accepted and stand out in bold [46]. Cronbach’s α coefficient was calculated by the total score of the scale and excluding each item from the scale. Interpretation: minimally acceptable 0.60–0.64, acceptable 0.65–0.69, good 0.70–0.74, very good 0.75–0.80, and excellent > 0,80 [62]. Bold values are above 0.70.

Sensitivity and specificity

All items presented sensitivity to the isolated or grouped treatments, however, specificity was only observed for the items orbital opening and upper lip relaxation (Table 7).

Table 7. FaceSed sensitivity and specificity.

FaceSed Items Ears Orbital opening Lower lip relaxation Upper lip relaxation
All treatments
Sensitivity % 97 96 87 82
Specificity % 44 79 49 82

FaceSed—Numerical facial scale of sedation in horses. Interpretation: excellent 95–100%, good 85–94.9%, moderate 70–84.9%, and non-sensitive or non-specific < 70% [39, 63].

ROC curve and cut-off point of the FaceSed

The area under the curve was 0.96, representing the high precision of the scale (Fig 4). The Younden index was > 5 for all the evaluators. The resampling bootstrap CIs were 4.5 and 5.5, and the ranges of sensitivity and specificity > 90 were 5.3 and 5.5. Based on the resampling result, the diagnostic uncertainty zone ranged from 5.3 to 5.5, which means that horses with scores < 5 are not sufficiently sedated and horses with scores > 6 are sufficiently sedated.

Fig 4. Area under the curve (AUC) and two-graph ROC curve with the diagnostic uncertainty zone.

Fig 4

Interpretation of AUC: > 0.90 presents high discriminatory capacity. The two-graph ROC curve estimates the diagnostic uncertainty zone of the cut-off point according to the Youden index.

Frequency distribution of scores

There was a higher frequency of 0 scores at baseline for the eyes and upper lip items (Fig 5), corroborating with the specificity data (Table 7).

Fig 5. Frequency distribution of FaceSed scores before and after sedation/tranquilisation.

Fig 5

ACPL + ACPH—acepromazine in low and high doses, DL + DLM—detomidine in low dose and associated with methadone, DH and DHM—detomidine in high dose and associated with methadone.

Scores 1 and 2 predominated at peak sedation and/or intermediate sedation for all items.

Discussion

The results of this study show that FaceSed is a simple and practical scale that offers reliability and validity to evaluate sedation over time in horses submitted to tranquilisation with acepromazine and sedation with the alpha-2 agonist detomidine with or without the opioid methadone, based on the validation criteria of the literature [39, 40, 56, 6668]. This main advantage is that it does not imply interaction with the horse. However, differentiation between tranquilisation and low degree of sedation may be difficult to assess using both FaceSed and numerical rating scale.

The training phase that our evaluators faced resulted in a good inter-observer correlation. Even when the evaluators might be experienced and familiarised with the scale, this is not a guarantee of good reliability [69]. Therefore, training is strongly recommended even when using validating scales as it improves reproducibility [70, 71].

According to the results of content validation by the external veterinary anaesthesiologists, the relevance of score 1 for partial opening of the ears was less than 0.5. However, as the other ear-related items were approved, we decided to maintain it. The item upper lip relaxation was not considered relevant, possibly because the upper lip drop is not described in other studies evaluating sedation. Thus, the consultants’ lack of familiarity may have led to the questioning of its relevance. Still, this item was maintained given to its importance during the development of the FaceSed and justified by the subsequent analysis. The content validity is used to identify how well item descriptions are formulated and cover the proposed theme [21] and in this study, FaceSed content validity was consolidated in a similar way of pain scales in animals [39, 4143, 56].

Phase II was performed 11 months apart from phase I, because it was not parallel to the other previously published simultaneous studies [11, 49, 50]. We decided to also complement the validity assessment of the FaceSed for tranquilisation, and this was the only period that the facilities, horses, and authors were available. However all photos from phases I and II were combined and randomised for evaluation by the observers without distinction of phases.

The sum of the FaceSed presented good to very good intraobserver reliability and moderate to very good interobserver reliability, which guarantees the repeatability and reproducibility of the FaceSed for future studies. This is in contrast with a previous study that evaluated facial expression in ridden horses [55]. That study suggested that reproducibility for eye evaluations was not very consistent, with Kappa values < 0.42 even with trained evaluators. This discrepancy between the two studies may be explained because the FaceSed is a simpler instrument with fewer and more easily identifiable descriptors of sedative expressions compared to facial expressions in ridden horses. Our results showed reduced interobserver reliability for the items ears, and lower and upper lips, which possibly contributed to the interobserver reliability of the FaceSed being slightly lower than that of numerical rating scale. The biases that could affect the reliability of an instrument are mainly a prolonged evaluation time, which makes the evaluators tired, inadequate description of the items, and finally, lack of practice of the evaluators [72, 73]. None of these biases were apparently the case in this study.

The fact that the numerical rating scale presented better inter-rater reliability than the FaceSed was surprising since the reliability of unidimensional scales is usually not good [25]. The numerical rating scale was scored before the FaceSed, which excluded the bias of scoring it based on the facial score. However, the fact that the evaluators were previously trained to recognise the facial sedation characteristics described by the FaceSed may have improved reliability of the numerical rating scale. Another point was that for some photos it was possible to identify sweating, and because neck position might be apparent in some of the lateral view pictures, the observers might be able to distinguish the low HHAG% induced by sedation. Thus, the FaceSed may have been influenced by the previous evaluation of the numerical rating scale and neck position for some photos. This bias could be eliminated if the evaluations were performed twice and independently for each scale, however this was not feasible considering the extensive data analysis. Otherwise, if the FaceSed was scored first, its sedation descriptors might facilitate the appraisal of the numerical rating scale, leading to possible overrating of its reliability. Another point is that the fact that the FaceSed and the numerical rating scale were sequentially scored may have overestimated their correlation, however this is the usual procedure for concurrent criterion validation of instruments in the literature [3840, 56, 67, 68].

When proposing a new scale, the ideal method to perform concurrent criterion validation is to compare it with validated methods in order to find out how much their scores correlate with the proposed scale [74]. Since, unlike for equine pain scales [39, 44, 63], there is no validated instrument to evaluate sedation in horses, the numerical rating scale and the HHAG% were used. The approach based on correlation of the FaceSed against a unidimensional scale has been previously used when novel pain scales were elaborated for other species [41, 42, 56, 68]. Indeed, the FaceSed presented a high positive correlation with the numerical rating scale in all treatments, indicating the similarity in their magnitudes. The data on the HHAG% used in the present study have already been published in the simultaneous studies ran in parallel to the present one [11] and they were used here because the HHAG% is the most established and objective method to assess sedation in horses [9, 12, 14], and therefore the closest ‘gold standard’ instrument to compare with the FaceSed. There was a high negative correlation between the FaceSed and HHAG% for detomidine treatments and with all treatments combined and, as expected, an average correlation with acepromazine, as this drug does not reduce HHAG%.

Both the total FaceSed score and each of its items identified the facial changes of the horses over time both under tranquilisation and under sedation, since they presented higher scores at the peak and intermediate tranquilisation/sedation when compared to baseline and end of tranquilisation/sedation. The same was observed for numerical rating scale and HHAG%. These results are of importance as a responsive scale must detect the differences in scores in relation to the interventions that the instrument proposes to measure [75].

At the time-point of peak sedation, the FaceSed scores were lowest for tranquilisation (low and high dose of acepromazine: ACPL and ACPH), intermediate for moderate sedation with a low dose of detomidine (DL and DLM), and highest for deepest sedation with a high dose of detomidine (DH and DHM). Otherwise HHAG%, a widely used measure to assess sedation, failed to differentiate tranquilisation with acepromazine from low and high sedation intensities with detomidine (with or without methadone), possibly because lowering of the head is not so evident in horses tranquilised with acepromazine.

In the principal component analysis all items of the FaceSed followed the same trend to measure sedation in horses, as they followed the Kaiser criterion and presented in only one dimension load factors ≥ 0.50, eigenvalue> 1, and variance> 20 [59]. Also, the directions of the vectors indicate that all items identified sedation. The practical meaning is that all items are influenced by the intensity of sedation. This multivariate analysis prevents the possibility of the inclusion of items that are not mutually associated [76].

Another way to evaluate the correlation between the items on the scale is the internal consistency by Cronbach’s α coefficient, which was high for the FaceSed. This analysis can be interpreted together with the principal component analysis, in which high values of Cronbach’s α coefficient usually occur in situations where few dimensions are identified [46]. The internal consistency investigates whether the items of the scale show a consistent or similar response, given by a good mutual correlation, which was observed in the FaceSed.

Differently from internal consistency, which confirmed the intercorrelation of the items, item-total correlation tests the homogeneity of the scale based on the correlation of a particular item with the scale as a whole, omitting the target item [46]. Therefore, the contribution of each item is analysed independently from the other items. The items of the FaceSed contributed homogeneously to the sum of the scale [56], because 0.3 is the minimum correlation for an item to have a significant role on the scale [46]. Values above 0.7 indicate that the item only repeats the trend of the scale and could be redundant [46]. All items, except orbital opening (0.73), were within this range, showing that orbital opening may be a restatement of other items. This item was maintained in the scale because the item-total correlation was close to the maximum limit and it was within the approval criteria of all other tests.

All items presented sensitivity to identify sedated horses. As for specificity, only the items orbital opening and upper lip relaxation were specific in detecting truthfully non-sedated animals. The items ears and lower lip were not specific, as scores 1 were attributed at baseline, as shown in the frequency distribution data. The lack of specificity of these two items is a limitation of the FaceSed, as it indicates that horses in their natural and relaxed state, present a false sedation characteristic [33].

To define the cut-off value of the FaceSed, the HHAG% was used as a predictive value, where horses with scores > 50% were considered sufficiently sedated [14]. According to the Youden index of the FaceSed, horses scoring > 5 are adequately sedated for standing procedures.

Although the horse grimace scale [31] along with other facial scales [26, 32, 33, 35, 36] were the starting point to develop the FaceSed, it was not feasible only to invert the scores of the horse grimace scale and use the same instrument to assess sedation, because the FaceSed was developed with specific descriptors of sedation in horses of the present study. Pain produces muscle contraction, while sedation produces muscle relaxation, therefore only the neutral point (normal state) is coincident in both conditions and scales, i.e., in the presence of sedation in the FaceSed, the lower and upper lips are relaxed and in the presence of pain they are contracted [31]. Ears are stiff and directed backwards in horses undergoing pain according to the horse grimace scale [31], otherwise they are relaxed and open in horses under sedation (FaceSed). Therefore during the process of content validation, it was necessary to include different descriptors.

With regard to ear position, we observed that in a sedated horse the lateral distance between the tips of the ears is widened, as described by the EquiFACS (Equine Facial Action Coding System) in relaxed horses, with reduced visibility of the inner ear in the lateral view [54]. The movement [26] and reaction to touch [9] of the ears have already been targets of the investigation to evaluate sedation, however their static positioning has not been well described. The evaluation of the ability of the ears to move, without considering position, may not discriminate between a horse that is sedated and/or in pain. In the case of pain, the distance between the tips of the ears is also increased, but in a backward direction or asymmetry instead [31, 44].

The facial action unit orbital closure described by the EquiFACS in relaxed horses, occurs due to the relaxation of the upper eyelid levator palpebrae superioris muscle [54]. Orbital opening/closure was the only coincident item between the horse grimace scale and FaceSed. It should not be confused with the orbital tightening shown in the horse grimace scale of horses with pain, where there is tension above the eye area [31] due to contraction of the muscles above the eye area. The lower lip relaxation presented low specificity in the FaceSed which may be justified by its occurrence in non-sedated relaxed horses as well [54].

The upper lip relaxation accompanies the vertical stretching of the nostrils, also described in the EquiFACS as a change in the conformation of the edges of the nostrils from a curved to a more elongated shape [54]. However, in EquiFACS this action occurs together with the elevation of the nostril [54, 55], and not the relaxation of the lip as observed in the present study in sedated horses. Inclusion of a specific item for nose changes might be prone to human biases, because when people process facial expressions, noses do not attract attention from people and other animals [77]. The dynamic perception of emotions by human beings are modulated by their main area of interest which starts by analysing the eyes and mouth, for both joy and fear, and so sedation signals of the FaseSed could be missed [77, 78]. Facial scales developed to identify pain or sedation might be susceptible to human biases if they are not performed in a systematic way like EquiFACS [54].

Our study is not free of limitations. One bias that needs to be considered is that all evaluators knew that the objective of the study was to identify sedation and two evaluators were present during the experimental phase. To reduce the expectation bias, evaluators were independent and blinded to the treatments. Furthermore, as previously mentioned, sweating and position of the neck might be apparent in the lateral positioning photos, therefore, at some time-points the sedation depth might be identifiable, which could affect scoring decision making. The first limitation of the study was that the horses were docile, acclimated to the site, experimental handling and face-to-face evaluators. This may have contributed to the false interpretation that the horses were sedated at baseline, therefore reducing specificity values. To overcome this drawback, the reliability of FaceSed should be tested in different scenarios and environments, with various types of handling, and by different people from those the horses are used to, to ensure that the instrument presents the same reliability. A second limitation was that the photographic record was not made from videos, which may not identify the full expression of facial regions acting or relaxing at the moment of evaluation [43]. Thus, it would be advisable to apply the scale through short videos instead of photographs and scale on-site. Either way, there is still the bias of the photographer to determine the moments for picture capture. Furthermore, measuring movement in photographs is difficult and a confounding factor might be the breed differences when accessing photos (i.e. eye wrinkle) [79]. Another limitation of the FaceSed along with other facial scales, is that it would be difficult to apply for dental and ophthalmic interventions.

A final limitation, as mentioned above was that, as in numerical rating scale and HHAG%, the FaceSed was not capable of differentiating tranquilisation from low sedation. Further refinements in facial recognition, which may include not only visual analysis of images but also geometric morphometric approaches, as reported in cats [80] or even, in the future, deep learning tools [81] might be useful to differentiate these stages of responses to drugs.

In conclusion, the FaceSed presents content, criteria, construct validities and adequate intra and interobserver reliability to identify both tranquilisation and sedation in horses when assessed by trained anaesthesiologists. Further studies in clinical and other experimental scenarios and assessment by inexperienced observers may either confirm or not whether facial sedative characteristics evaluated on-site will present similar results. At this stage, FaceSed is a short, easy to apply scale and may be useful in clinical practice and in research purposes. Other main advantages include that it demands a short time to be applied without interaction with the horse.

Supporting information

S1 Appendix. Flowchart of the methodology of the data collection of a facial sedation scale in horses.

(TIF)

S2 Appendix. Guidelines for evaluation of a facial sedation scale in horses.

(DOCX)

S1 Data

(XLSX)

Acknowledgments

We thank the colleagues who kindly participated in the EquiSed content validation: Marilda Onghero Taffarel, Flávia Augusta de Oliveira, and Carlize Lopes, and Altamiro Rosam for the care and dedication to the animals.

Data Availability

Data was included as a Supporting information file (Data.FaceSed).

Funding Statement

São Paulo Research Foundation (FAPESP) for the doctoral scholarship granted (ARO), protocol 2017/16208-0, thematic project (SPLL) 2017/12815-0 and post-doctoral grant (M.G.M) 2017/01425-6. Funder website: https://fapesp.br/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Potter JJ, Macfarlane PD, Love EJ, Tremaine H, Taylor PM, Murrell JC. Preliminary investigation comparing a detomidine continuous rate infusion combined with either morphine or buprenorphine for standing sedation in horses. Vet Anaesth Analg. 2016;43: 189–194. 10.1111/vaa.12316 [DOI] [PubMed] [Google Scholar]
  • 2.Johnston GM, Steffey E. Confidential enquiry into perioperative equine fatalities (CEPEF). Vet Surg. 1995;24: 518–519. [DOI] [PubMed] [Google Scholar]
  • 3.Johnston GM, Eastment JK, Wood JLN, Taylor PM. The confidential enquiry into perioperative equine fatalities (CEPEF): mortality results of phases 1 and 2. Vet Anaesth Analg. 2002;29: 159–170. 10.1046/j.1467-2995.2002.00106.x [DOI] [PubMed] [Google Scholar]
  • 4.Ringer SK, Portier KG, Fourel I, Bettschart-Wolfensberger R. Development of a romifidine constant rate infusion with or without butorphanol for standing sedation of horses. Vet Anaesth Analg. 2012;39: 12–20. 10.1111/j.1467-2995.2011.00681.x [DOI] [PubMed] [Google Scholar]
  • 5.Ringer SK, Portier K, Torgerson PR, Castagno R, Bettschart-Wolfensberger R. The effects of a loading dose followed by constant rate infusion of xylazine compared with romifidine on sedation, ataxia and response to stimuli in horses. Vet Anaesth Analg. 2013;40: 157–165. 10.1111/j.1467-2995.2012.00784.x [DOI] [PubMed] [Google Scholar]
  • 6.Medeiros LQ, Gozalo-Marcilla M, Taylor PM, Campagnol D, De Oliveira FA, Watanabe MJ, et al. Sedative and cardiopulmonary effects of dexmedetomidine infusions randomly receiving, or not, butorphanol in standing horses. Vet Rec. 2017;181: 402. 10.1136/vr.104359 [DOI] [PubMed] [Google Scholar]
  • 7.Benredouane K, Ringer SK, Fourel I, Lepage OM, Portier KG, Bettschart-Wolfensberger R. Comparison of xylazine-butorphanol and xylazine-morphine-ketamine infusions in horses undergoing a standing surgery. Vet Rec. 2011;169: 364. 10.1136/vr.d5333 [DOI] [PubMed] [Google Scholar]
  • 8.Bryant CE, England GCW, Clarke KW. A comparison of the sedative effects of medetomidine and xylazine in the horse. Vet Anaesth Analg. 1991;18: 55–57. 10.1111/j.1467-2995.1991.tb00511.x [DOI] [PubMed] [Google Scholar]
  • 9.Gozalo-Marcilla M, Luna SP, Crosignani N, Filho JNP, Possebon FS, Pelligand L, et al. Sedative and antinociceptive effects of different combinations of detomidine and methadone in standing horses. Vet Anaesth Analg. 2017;44: 1116–1127. 10.1016/j.vaa.2017.03.009 [DOI] [PubMed] [Google Scholar]
  • 10.Clutton RE. Opioid Analgesia in Horses. Vet Clin North Am—Equine Pract. 2010;26: 493–514. 10.1016/j.cveq.2010.07.002 [DOI] [PubMed] [Google Scholar]
  • 11.Gozalo-Marcilla M, de Oliveira AR, Fonseca MW, Possebon FS, Pelligand L, Taylor PM, et al. Sedative and antinociceptive effects of different detomidine constant rate infusions, with or without methadone in standing horses. Equine Vet J. 2018;51: 530–536. 10.1111/evj.13054 [DOI] [PubMed] [Google Scholar]
  • 12.Schauvliege S, Cuypers C, Michielsen A, Gasthuys F, Gozalo-Marcilla M. How to score sedation and adjust the administration rate of sedatives in horses: a literature review and introduction of the Ghent Sedation Algorithm. Vet Anaesth Analg. 2019;46: 4–13. 10.1016/j.vaa.2018.08.005 [DOI] [PubMed] [Google Scholar]
  • 13.Clarke KW, England GCW, Goossens L. Sedative and cardiovascular effects of romifidine, alone and in combination with butorphanol, in the horse. Vet Anaesth Analg. 1991;18: 25–29. 10.1111/j.1467-2995.1991.tb00008.x [DOI] [Google Scholar]
  • 14.Ringer SK, Portier KG, Fourel I, Bettschart-Wolfensberger R. Development of a xylazine constant rate infusion with or without butorphanol for standing sedation of horses. Vet Anaesth Analg. 2012;39: 1–11. 10.1111/j.1467-2995.2011.00653.x [DOI] [PubMed] [Google Scholar]
  • 15.Costa GL, Cristarella S, Quartuccio M, Interlandi C. Anti-nociceptive and sedative effects of romifidine, tramadol and their combination administered intravenously slowly in ponies. Vet Anaesth Analg. 2015;42: 220–225. 10.1111/vaa.12210 [DOI] [PubMed] [Google Scholar]
  • 16.L’Ami JJ, Vermunt LE, van Loon JPAM, Sloet van Oldruitenborgh-Oosterbaan MM. Sublingual administration of detomidine in horses: Sedative effect, analgesia and detection time. Vet J. 2013;196: 253–259. 10.1016/j.tvjl.2012.08.016 [DOI] [PubMed] [Google Scholar]
  • 17.Ranheim B, Risberg AI, Spadavecchia C, Landsem R, Haga HA. The pharmacokinetics of dexmedetomidine administered as a constant rate infusion in horses. J Vet Pharmacol Ther. 2015;38: 93–96. 10.1111/jvp.12157 [DOI] [PubMed] [Google Scholar]
  • 18.Hamm D, Jöchle W. Sedation and analgesia with dormosedan® (Detomidine hydrochloride) or acepromazine for suturing of the vulvar lips in mares (Caslick’s surgery). J Equine Vet Sci. 1991;11: 86–88. 10.1016/S0737-0806(07)80136-5 [DOI] [Google Scholar]
  • 19.England GC, Clarke KW, Goossens L. A comparison of the sedative effects of three alpha 2-adrenoceptor agonists (romifidine, detomidine and xylazine) in the horse. J Vet Pharmacol Ther. 1992;15: 194–201. 10.1111/j.1365-2885.1992.tb01007.x [DOI] [PubMed] [Google Scholar]
  • 20.Solano AM, Valverde A, Desrochers A, Nykamp S, Boure LP. Behavioural and cardiorespiratory effects of a constant rate infusion of medetomidine and morphine for sedation during standing laparoscopy in horses. Equine Vet J. 2009;41: 153–159. 10.2746/042516408x342984 [DOI] [PubMed] [Google Scholar]
  • 21.McDowell I. Measuring health: A guide to rating scales. 3rd ed. Sociology of Health & Illness. New York: Oxford University Press; 2006. 10.1111/1467-9566.ep10838657 [DOI] [Google Scholar]
  • 22.Love EJ, Murrell J, Whay HR. Thermal and mechanical nociceptive threshold testing in horses: A review. Vet Anaesth Analg. 2011;38: 3–14. 10.1111/j.1467-2995.2010.00580.x [DOI] [PubMed] [Google Scholar]
  • 23.Waltz C, Strickland O, Lenz E. Measurement in nursing and health research. 4th ed. New York: Springer Publishing Company; 2010. [Google Scholar]
  • 24.Clarke KW, Taylor PM. Detomidine: A new sedative for horses. Equine Vet J. 1986;18: 366–370. 10.1111/j.2042-3306.1986.tb03655.x [DOI] [PubMed] [Google Scholar]
  • 25.Holton LL, Scott EM, Nolan AM, Reid J, Welsh E, Flaherty D. Comparison of three methods used for assessment of pain in dogs. J Am Vet Med Assoc. 1998;212: 61–66. [PubMed] [Google Scholar]
  • 26.Marly C, Bettschart-Wolfensberger R, Nussbaumer P, Moine S, Ringer SK. Evaluation of a romifidine constant rate infusion protocol with or without butorphanol for dentistry and ophthalmologic procedures in standing horses. Vet Anaesth Analg. 2014;41: 491–497. 10.1111/vaa.12174 [DOI] [PubMed] [Google Scholar]
  • 27.Gozalo-Marcilla M, Luna SP, Gasthuys F, Pollaris E, Vlaminck L, Martens A, et al. Clinical applicability of detomidine and methadone constant rate infusions for surgery in standing horses. Vet Anaesth Analg. 2019;46: 325–334. 10.1016/j.vaa.2019.01.005 [DOI] [PubMed] [Google Scholar]
  • 28.Mama KR, Grimsrud K, Snell T, Stanley S. Plasma concentrations, behavioural and physiological effects following intravenous and intramuscular detomidine in horses. Equine Vet J. 2009;41: 772–777. 10.2746/042516409x421624 [DOI] [PubMed] [Google Scholar]
  • 29.Grimsrud KN, Mama KR, Steffey EP, Stanley SD. Pharmacokinetics and pharmacodynamics of intravenous medetomidine in the horse. Vet Anaesth Analg. 2012;39: 38–48. 10.1111/j.1467-2995.2011.00669.x [DOI] [PubMed] [Google Scholar]
  • 30.Darwin C. General principles of expression. Expression of the emotion in man and animals. London: John Murray; 1872. p. 374. [Google Scholar]
  • 31.Dalla Costa E, Minero M, Lebelt D, Stucke D, Canali E, Leach MC. Development of the Horse Grimace Scale (HGS) as a pain assessment tool in horses undergoing routine castration. Hillman E, editor. PLoS One. 2014;9: e92281. 10.1371/journal.pone.0092281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Vandierendonck MC, Van Loon JPAM. Monitoring acute equine visceral pain with the Equine Utrecht University Scale for Composite Pain Assessment (EQUUS-COMPASS) and the Equine Utrecht University Scale for Facial Assessment of Pain (EQUUS-FAP): A validation study. Vet J. 2016;216: 175–177. 10.1016/j.tvjl.2016.08.004 [DOI] [PubMed] [Google Scholar]
  • 33.Gleerup KB, Forkman B, Lindegaard C, Andersen PH. An equine pain face. Vet Anaesth Analg. 2015;42: 103–114. 10.1111/vaa.12212 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mellor DJ. Mouth pain in horses: Physiological foundations, behavioural indices, welfare implications, and a suggested solution. Animals. 2020;10. 10.3390/ani10040572 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Figueiredo J, Muir W, Smith J, Wolfrom G. Sedative and analgesic effects of romifidine in horses. Int J Appl Res Vet Med. 2005;3: 249–258. [Google Scholar]
  • 36.Sacks M, Ringer SK, Bischofberger AS, Berchtold SM, Bettschart-Wolfensberger R. Clinical comparison of dexmedetomidine and medetomidine for isoflurane balanced anaesthesia in horses. Vet Anaesth Analg. 2017;44: 1128–1138. 10.1016/j.vaa.2016.12.061 [DOI] [PubMed] [Google Scholar]
  • 37.Freeman SL, England GCW. Investigation of romifidine and detomidine for the clinical sedation of horses. Vet Rec. 2000;147: 507–511. 10.1136/vr.147.18.507 [DOI] [PubMed] [Google Scholar]
  • 38.Brondani JT, Mama KR, Luna SPL, Wright BD, Niyom S, Ambrosio J, et al. Validation of the English version of the UNESP-Botucatu multidimensional composite pain scale for assessing postoperative pain in cats. BMC Vet Res. 2013;9: 143. 10.1186/1746-6148-9-143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Taffarel MO, Luna SPL, de Oliveira FA, Cardoso GS, Alonso J de M, Pantoja JC, et al. Refinement and partial validation of the UNESP-Botucatu multidimensional composite pain scale for assessing postoperative pain in horses. BMC Vet Res. 2015;11: 83. 10.1186/s12917-015-0395-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.de Oliveira FA, Luna SPL, do Amaral JB, Rodrigues KA, Sant’Anna AC, Daolio M, et al. Validation of the UNESP-Botucatu unidimensional composite pain scale for assessing postoperative pain in cattle. BMC Vet Res. 2014;10: 200. 10.1186/s12917-014-0200-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Luna SPL, de Araújo AL, da Nóbrega Neto PI, Brondani JT, de Oliveira FA, dos Santos Azerêdo LM, et al. Validation of the UNESP-Botucatu pig composite acute pain scale (UPAPS). PLoS One. 2020;15: 1–27. 10.1371/journal.pone.0233552 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Silva NEOF, Trindade PHE, Oliveira AR, Taffarel MO, Moreira MAP, Denadai R, et al. Validation of the NESP-Botucatu composite scale to assess acute postoperative abdominal pain in sheep (USAPS). PLoS One. 2020;15: 1–27. 10.1371/journal.pone.0239622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Evangelista MC, Watanabe R, Leung VSY, Monteiro BP, O’Toole E, Pang DSJ, et al. Facial expressions of pain in cats: the development and validation of a Feline Grimace Scale. Sci Rep. 2019;9: 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.van Loon JPAM Van Dierendonck MC. Monitoring acute equine visceral pain with the Equine Utrecht University Scale for Composite Pain Assessment (EQUUS-COMPASS) and the Equine Utrecht University Scale for Facial Assessment of Pain (EQUUS-FAP): A scale-construction study. Vet J. 2015;206: 356–364. 10.1016/j.tvjl.2015.08.023 [DOI] [PubMed] [Google Scholar]
  • 45.Häger C, Biernot S, Buettner M, Glage S, Keubler LM, Held N, et al. The Sheep Grimace Scale as an indicator of post-operative distress and pain in laboratory sheep. PLoS One. 2017;12: 1–15. 10.1371/journal.pone.0175839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Streiner DL, Norman GR, Cairney J. Health measurement scales: A practical guide to their development and use. 5th ed. New York: Oxford University Press; 2015. [Google Scholar]
  • 47.DeVellis RF. Scale development theory and applications. SAGE Publ. 2016;4: 256. [Google Scholar]
  • 48.de Oliveira AR, Gozalo-Marcilla M, Ringer SK, Schauvliege S, Fonseca MW, Trindade PHE, et al. Development, validation and reliability of a sedation scale in horses (EquiSed). Front Vet Sci. 2021. 10.3389/fvets.2021.611729 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Gozalo-Marcilla M, Moreira da Silva R, Pacca Loureiro Luna S, Rodrigues de Oliveira A, Werneck Fonseca M, Peporine Lopes N, et al. A possible solution to model nonlinearity in elimination and distributional clearances with α2-adrenergic receptor agonists: Example of the intravenous detomidine and methadone combination in sedated horses. J Vet Pharmacol Ther. 2019; 1–7. 10.1111/jvp.12815 [DOI] [PubMed] [Google Scholar]
  • 50.Gozalo-Marcilla M, Luna SPL, Moreira da Silva R, Crosignani N, Lopes NP, Taylor PM, et al. Characterisation of the in vivo interactions between detomidine and methadone in horses: Pharmacokinetic and pharmacodynamic modelling. Equine Vet J. 2019;51: 517–529. 10.1111/evj.13031 [DOI] [PubMed] [Google Scholar]
  • 51.Stats to do. 2017 [cited 3 Apr 2017]. https://www.statstodo.com/SSizUnpairedDiff_Pgm.php.
  • 52.de Oliveira FA, Pignaton W, Teixeira-Neto FJ, de Queiroz-Neto A, Puoli-Filho JNP, Scognamillo MVR, et al. Antinociceptive and behavioral effects of methadone alone or in combination with detomidine in conscious horses. J Equine Vet Sci. 2014;34: 380–386. 10.1016/j.jevs.2013.07.012 [DOI] [Google Scholar]
  • 53.https://sorteador.com.br/. 2017 [cited 2 Feb 2017].
  • 54.Wathan J, Burrows AM, Waller BM, McComb K. EquiFACS: The Equine Facial Action Coding System. PLoS One. 2015;10: e0131738. 10.1371/journal.pone.0131738 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Mullard J, Berger JM, Ellis AD, Dyson S. Development of an ethogram to describe facial expressions in ridden horses (FEReq). J Vet Behav Clin Appl Res. 2017;18: 7–12. 10.1016/j.jveb.2016.11.005 [DOI] [Google Scholar]
  • 56.Brondani JT, Luna SPL, Padovani CR. Refinement and initial validation of a multidimensional composite scale for use in assessing acute postoperative pain in cats. Am J Vet Res. 2011;72: 174–183. 10.2460/ajvr.72.2.174 [DOI] [PubMed] [Google Scholar]
  • 57.Miot HA. Agreement analysis in clinical and experimental studies. J Vasc Bras. 2016;15: 89–92.29930571 [Google Scholar]
  • 58.Altman D. Some commom problems in medical research. In: Hall C&, editor. Pratical statistics for medical research. London; 1991. pp. 404–408. [Google Scholar]
  • 59.Kaiser HF. The varimax criterion for analytic rotation in factor analysis. Psychometrika. 1958;23: 187–200. 10.1007/BF02289233 [DOI] [Google Scholar]
  • 60.Hair J, Black W, Babin BJ, Anderson R, Tatham R. Multivariate data analysis. Upper Saddle River: Prentice hall; 1998. [Google Scholar]
  • 61.Cronbach L. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16: 297–333. [Google Scholar]
  • 62.Streiner DL. Starting at the beginning: An introduction to coefficient alpha and internal consistency. J Pers Assess. 2003;80: 99–103. 10.1207/S15327752JPA8001_18 [DOI] [PubMed] [Google Scholar]
  • 63.Bussières G, Jacques C, Lainay O, Beauchamp G, Leblond A, Cadoré J-L, et al. Development of a composite orthopaedic pain scale in horses. Res Vet Sci. 2008;85: 294–306. 10.1016/j.rvsc.2007.10.011 [DOI] [PubMed] [Google Scholar]
  • 64.Streiner DL, Cairney J. What’s under the ROC? An introduction to receiver operating characteristics curves. Can J Psychiatry. 2007;52: 121–128. 10.1177/070674370705200210 [DOI] [PubMed] [Google Scholar]
  • 65.Mallat J, Meddour M, Durville E, Lemyze M, Pepy F, Temime J, et al. Decrease in pulse pressure and stroke volume variations after mini-fluid challenge accurately predicts fluid responsiveness. Br J Anaesth. 2015;115: 449–456. 10.1093/bja/aev222 [DOI] [PubMed] [Google Scholar]
  • 66.Sessler CN, Gosnell MS, Grap MJ, Brophy GM, O’Neal P V., Keane KA, et al. The Richmond Agitation-Sedation Scale: Validity and reliability in adult intensive care unit patients. Am J Respir Crit Care Med. 2002;166: 1338–1344. 10.1164/rccm.2107138 [DOI] [PubMed] [Google Scholar]
  • 67.De Lemos J, Tweeddale M, Chittock D. Measuring quality of sedation in adult mechanically ventilated critically ill patients: The Vancouver Interaction and Calmness Scale. J Clin Epidemiol. 2000;53: 908–919. 10.1016/s0895-4356(00)00208-0 [DOI] [PubMed] [Google Scholar]
  • 68.Ashkenazy S, DeKeyser-Ganz F. Assessment of the reliability and validity of the Comfort Scale for adult intensive care patients. Hear Lung. 2011;40: 44–51. 10.1016/j.hrtlng.2009.12.011 [DOI] [PubMed] [Google Scholar]
  • 69.Roughan J V., Flecknell PA. Training in behaviour-based post-operative pain scoring in rats—An evaluation based on improved recognition of analgesic requirements. Appl Anim Behav Sci. 2006;96: 327–342. 10.1016/j.applanim.2005.06.012 [DOI] [Google Scholar]
  • 70.Zhang EQ, Leung VSY, Pang DSJ. Influence of rater training on inter- And intrarater reliability when using the rat grimace scale. J Am Assoc Lab Anim Sci. 2019;58: 178–183. 10.30802/AALAS-JAALAS-18-000044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Mich PM, Hellyer PW, Kogan L, Schoenfeld-Tacher R. Effects of a pilot training program on veterinary students’ pain knowledge, attitude, and assessment skills. J Vet Med Educ. 2010;37: 358–368. 10.3138/jvme.37.4.358 [DOI] [PubMed] [Google Scholar]
  • 72.Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am J Med. 2006;119: 166. 10.1016/j.amjmed.2005.10.036 [DOI] [PubMed] [Google Scholar]
  • 73.Martin Paul; Bateson P. Measuring Behaviour: An Introductory Guide. 3rd editio. Cambridge: University Press; 2007. [Google Scholar]
  • 74.de Vet HCW, Terwee CB, Mokkink LB., Knol DL. Measurement in Medicine A Practical Guide. Cambridge: University Press; 2011. [Google Scholar]
  • 75.Hays RD, Hadorn D. Responsiveness to change: an aspect of validity, not a separate dimension. Qual Life Res. 1992;1: 73–75. 10.1007/BF00435438 [DOI] [PubMed] [Google Scholar]
  • 76.Plichta SB, Kelvin EA. Munro’s statistical methods for health care research. 6th ed. Philadelphia: Wolters Kluwer Health; 2011. [Google Scholar]
  • 77.Correia-Caeiro C, Guo K, Mills DS. Perception of dynamic facial expressions of emotion between dogs and humans. Anim Cogn. 2020;23: 465–476. 10.1007/s10071-020-01348-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Kano F, Tomonaga M. Face scanning in chimpanzees and humans: continuity and discontinuity. Anim Behav. 2010;79: 227–235. 10.1016/j.anbehav.2009.11.003 [DOI] [Google Scholar]
  • 79.Schanz L, Krueger K, Hintze S. Sex and age don’t matter, but breed type does-factors influencing eye wrinkle expression in horses. Front Vet Sci. 2019;6. 10.3389/fvets.2019.00154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Finka LR, Luna SP, Brondani JT, Tzimiropoulos Y, McDonagh J, Farnworth MJ, et al. Geometric morphometrics for the study of facial expressions in non-human animals, using the domestic cat as an exemplar. Sci Rep. 2019;9: 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Mathis MW, Mathis A. Deep learning tools for the measurement of animal behavior in neuroscience. Curr Opin Neurobiol. 2020;60: 1–11. 10.1016/j.conb.2019.10.008 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Chang-Qing Gao

1 Dec 2020

PONE-D-20-32183

Development and validation of the facial scale (facesed) to evaluate sedation in horses

PLOS ONE

Dear Dr. Luna,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 15 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Chang-Qing Gao

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere:

'The only already published data was the measurement of head height above the ground (HHAG) collected in situ. However, in the original study, this data is expressed in centimeters and in this study data is reported in percentage. This information has been included in the methods section. The same data (HHAG%) is under submission in another article about a behavioural sedation scale (EquiSed).

The inclusion of HHAG data in this manuscript does not imply in dual publication because data was used only for comparison against the proposed scale.'

Please clarify whether this  publication was peer-reviewed and formally published.

If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Interesting study done as part of/in tandem with another study on anaesthesia, where the authors develop and validate a scale for sedation levels in horses through facial cues. The work follows previous publications on the Horse Grimace Scale and combines this with a couple other works to build the FaceSed. The work done to validate the scale is comprehensive, but I think several clarifications and more information are needed at this point. I also think the authors need to better justify the need for this scale as opposed to just using the HGS and invert the scores (in which this study could simply present an application of HGS for sedation contexts).

Line 16: Sorry, maybe I'm missing something, but I don't understand the author contributions marked with "&". There is only one author marked with "&", so contributed equally to whom?

Line 51: I would suggest adding a short paragraph on the different sedation agents usually used here for horses and why, and then in the methods justify based on this why the authors chose the sedation agents for this study. (I am expecting these to be for example the most common ones).

Line 53-57: Please expand on each of these scales, including what they are, how they measure sedation, differences between them, and why, in the author's opinion they are more or less objective. Also, mention briefly if they were used after development (cite all studies if not many were done). This is the state of the art of this introduction, so needs more information as not everyone might be familiar with all the scales published till date. Furthermore, this will set the scene to then explain why the current new method is better than all of these past ones.

Line 54: I guess this will be replied once there is more information about each scale, but is HHAG a scale or a one item measurement? The references are for studies, I assume, applying this measure. But the reference here should be only the publication that developed the scale. So if I want to use this scale, which publication was the pioneer in using this measurement? The subsequent studies can then be cited in applications of this HHAG tool (and the same goes for the other scales).

Line 58-62: why were these scales deemed convenient? In what way? Also, please add which scales are used for clinical or experimental purposes. The name of each scale, whenever mentioned, should be in the text, so the reader doesn't need to go back to the references everytime.

Line 61-62: Do the authors mean here that in order to use some scales, the handler needs to administer certain visual or auditory stimuli to the anaesthetised animal? If so, please clarify this idea, as it seems a bit vague at the moment and give examples of what kind of visual/auditory stimuli are part of the scales. Also, if I understood this correctly, there is another potential problem other than the ones cited by the author already, as delivery of these stimuli (e.g. intensity, quality) will probably vary a lot with the handler, which can create a confounding variable. For example, a lower intensity stimuli, means a lower response and assumed higher sedation, when this might not be true.

Line 63: The content related to facial expressions within each of the scales mentioned above needs to be expanded. Please include, which scales contain facial expression attributes, and give examples for each scale of these facial attributes.

Line 63-64: the second part of this sentence is a bit confusing for me. These scales were developed for the sole purpose of identifying sedation, so how can they not be relevant? I am guessing by the next sentence, the authors mean validation, consistency, reliability, etc., so perhaps the word relevance is better replaced by something else, like psychometric attributes or something similar? The authors vaguely pass the main important idea here, which is that anyone can develop any scale, but the quality of it as a tool for measuring a construct has to be assessed after development. This needs to be made clearer.

Line 63: I know the authors intend to develop a tool, but more background on the tool development is needed. Perhaps around here, the authors need to explain why the face and facial expressions are relevant for 1) pain and thus 2) evaluating sedation, but also 3) for horses. This needs to be argued well, since some scales include facial attributes and others not. There is also the issue of horses being a prey species and hence, facial signs of pain might not be present (adaptively, they would not be present). What is so special about horse faces (and faces in general) that makes them good indicators for sedation scales? I would suggest here maybe mention some human literature on facial expressions, and/or what we know about horse making use of their facial expressions in general (there are some studies on this). This can then be linked with for example, the HGS that is already mentioned on line 66.

Line 66: It is great to have these examples, but please mention which ones are being cited. Also, while the examples are important, the reason why a scale needs validation is not because others validated their scales before. So please add why it is important to validate a scale.

Line 69-70: why are they important? Please expand.

Line 74: what criteria? please mention them again. As these are the goals of this study, it has to be clear what the authors propose on doing. The prediction is not really a prediction at the moment. It needs to actually predict what the authors think it's going to measure in the different phases of the experiment, and be justified as well based either on lit or logical statements (if lit is not available). Also, the "why" of this study needs to be added to the goal.

Line 93: Was phase I and phase II done in different days? The same individual was thus subjected to all the treatments, i.e. detomidine, detomidine+methadone, acepromazine, in different days, correct? Please make this clear here.

Also, make it clear that what the current manuscript is reporting, is only phase II, and phase I is described only for understanding of phase II. This is important in order to not plagiarise the work published in phase I.

Line 110: how was the baseline decided?

Line 113: What does in situ registration means?

Line 117: I don't think the links for this and the power sample above need to be in text, they can be transformed as a reference - maybe check PLOS ONE referencing guide or a style book for proper in-text citation of websites.

Line 128-131: I'm assuming this was done in phase I but is part of the study reported in this manuscript, correct? This needs to be made clearer. The authors can simply add something like "For the previous study...", "For this study...". Again, it is important to understand what belongs to the study here reported and the study previously published.

Line 130: Why these time points? Is there previous evidence in the literature indicating these time points as ideal sampling?

Line 134: I was surprised here to see that phase I and II had almost a year in between. This information should come before phase I description. Why was there such a long time interval? This means that all the horses were a year older (or just about).

Line 137: Please include what was exactly measured here. Was there a baseline point, HHAG, sedation evaluations and pictures taken in 6 time points, from 5 till 120min? Why here there were less time points?

Line 143: I am a bit confused here, what is the main study here referring to? The previously published one? I would suggest the authors find a way of better distinguishing between these two studies (e.g. give it a name) and then refer to it consistently and clearly define what was in one study and what is in the current study. As a reviewer, I can't really comment on a previous published study in the same way I should comment on the present work, but there are several moments I am not sure which one is which. I think it's great to take advantage of previous studies to collect data, but the reporting of these studies should be carefully done. As I mentioned before, there are auto-plagiarism concerns when publishing twice the same work, and also it can give the impression this is somehow going towards the practice of salami slicing (see for example: https://doi.org/10.1038/nmat1305 or https://doi.org/10.1007/s10459-019-09876-7). Both in the introduction (goals) and methods, throughout the text, it needs to be perfectly clear what was the goal of the previous published study, what is the goal of this study, how do they differ, and why are they being published in separate articles.

Line 152: I am unsure what reference 28 is supposed to mean here, since it refers to a conference presentation...? First, if the reference is correct, something like "personal communication" should be added, since this is not a peer-reviewed published publication and it is not accessible. In fact, according to PLOS ONE guidelines, this should not even be cited:

Do not cite the following sources in the reference list:

• Personal communications (these should be supported by a letter from the relevant authors but not included in the reference list)

If there are more cases of conference presentations, please correct them.

Second, the "degree of sedation" evaluation criteria needs to be explained (either here or in supplementary materials).

Line 155: Why wasn't the end of sedation also determined through plasmatic concentration?

Line 156-161: The concentration of acepromazine was not measured, correct? If so, the authors shouldn't mention "reduction in concentration", since this was not measured. I am also a bit unsure about the way the justification for the time points is being presented. Obviously, the time points were chosen a priori since they are so uniform (i.e. every half hour), so unless there are studies showing that this is indeed such a regular action of the sedation agents, this needs to be corrected. It is important to either explain this differently or correct the justification for each time point, as at the moment it incurs in circular reasoning, i.e. the authors chose 60m, 90m, 120m a priori, then make the degree of sedation scores fit these points and use it to justify why these points were chosen. In order to correct this, either simply assume these time points were chosen for convenience or clearly explain how the degree of sedation scores were calculated (it is ok to go in supplem. materials if needed) and how much they had to increase/decrease in each point. And particularly, carefully justify why it just so happens to be such convenient time points for peak and so on.

Line 164: why is it relevant the 3 other coders are from different institutions and hold this diploma for scoring of facial expressions? All evaluators are authors of this manuscript, so is it right to assume all evaluators were aware of the goal of the coding? Also, was any of the evaluators present on site during the sedation experiments? If so, please add this information.

Line 170: Which author randomised and selected the photos? If one of the evaluators also did this, then please clearly state that this evaluator was not blind to the treatments. If not, please add who did this step.

Line 175: How many rounds of coding were needed to obtain 80%? Was this ICC test? If not, which correlation type was used? And what was the intra-observer reliability for the training? (I'm assuming that the evaluations done one month apart were for this - if not, please explain why that was done). Was the intra-observer taken into account before moving to the test coding?

Line 178-180: This is really not relevant information. What we need to know is what information was given to the evaluators with each photo pair (if any).

Line 181: Can these instructions be added as supplementary materials please?

Line 183: Is the NRS used here published anywhere? If so, please provide references. If not, please add the scale in supplem. materials. In any case, briefly describe what this scale is and how is it applied. Were all the evaluators very familiar with this scale?

Line 188: So the FaceSed is based on the HGS? Please make this clear (instead of vaguely indicating a reference that makes the reader go down to the reference list to understand what is being said), as it is an important methodological step. Also make it clear that eye tightening, strained mouth and stiff ears are indicators of pain in the HGS. Finally, please add more specific descriptions of what these terms mean exactly (according to the HGS).

Line 189: When I read this, the first though in my mind was: why do we need the FaceSed then, since we can just invert the scores from the HGS? Maybe rephrase this sentence or explain better how is it any different from the HGS. If FaceSed is basically scoring the relaxation of muscles from an apex point to a neutral point, then we don't really need it, as muscles do not contract and relax in different ways. How does FaceSed differs from an inverted HGS? Is it because it is combined with EquiFACS and other scales? This is a crucial point to defend in this manuscript, because it might be pointless to create yet another scale, if we could just use the HGS. (if this was the case, then the manuscript would then have to be reframed entirely in order to make clear that this is a validation/adaption of the HGS for a sedation situation and not a creation of a totally new scale).

Line 190-192: I don't quite understand this part of the sentence... photos from the other studies were used to describe sedation scores how? Also, how did untreated pictures of horses contribute to describe sedation scores? This needs to be rephrased as it is a bit confusing and vague. Also, the study cited after EquiFACS is not a study on untreated horses per se, but it is instead a publication detailing the development of a FACS tool to measure muscle-based facial movements in horses (in any situation). Furthermore, the studies cited in 20,21,30, are development of scales, so please name each scale from where the information was extracted. This should be corrected.

Line 192: If FaceSed is derived from a combination of HGS and EquiFACS, was any of the initial developers certified in EquiFACS? Please be aware that only certified coders in EquiFACS can use it. If yes, please add the name of which authors are EquiFACS certified, and if not please clearly state this. However, please be aware that if none of the authors is EquiFACS certified and still is using it, this is going against the guidelines for appropriate use of this tool. Please see https://www.animalfacs.com/equifacs, for more information. I would in fact, strongly advise the mention to EquiFACS to be deleted as a part of the methods for the FaceSed development if none of the authors has the certification. This can then be discussed as a limitation or a future direction in the discussion.

Line 195: How was the relevance attributed? Was it purely based on the subjective opinion of each evaluator, i.e. educated guesses? If so, please state this clearly, if there were basic criteria, please state them instead. If this was purely based on the subjective opinion of each evaluator, it would be interesting to disclose the reasoning for including/excluding items, or at least examples. heavily relying on "expert opinion" to develop an objective tool is always problematic since it often incurs in confirmatory biases. For example, veterinary anaesthesiologists might have certain impressions/ideas about what happens in the horse's face during anaesthesia, these ideas are then used to decide how to evaluate which items matter for evaluating horse's faces during anaesthesia, and the same ideas are then used to build an "objective" scale. This is particularly important as facial expressions are very subtle and hard to detect by people not trained to detect these facial behaviours. This is even harder to do in pictures, as there's considerably less information to decide if a movement is present of not.

Line 205: Right, so this should have been described above (see my previous comment).

Line 226: Please briefly explain what concurrent criteria validation is. The same for construct validity and the other validation measures.

Line 271: Please add versions of each software.

Line 427-430: Please explain why there is this discrepancy.

Line 440: But these were not apparent in the pictures, so how can it have influenced the evaluators?

Line 444-446: But this will assume the first scale is of high quality in all measurement components, plus it assumes there is already an existing scale. So I am not sure this is a "need" per se, instead it probably is more a term of comparison with previous work.

Lines 458-460: I didn't really understand what the authors are trying to say here, please rephrase.

Line 467: In other words, all items correlate with each other? Please explain what this means in practice.

Line 474: Does insufflate here means increase?

One thing I am not clear is if all the analysis undertaken here for the FaceSed assume the item for eyes is independent from the mouth or does assume they all are dependent? Maybe this could be discussed around line 478.

Line 494: FACS systems do not describe "characteristics", they describe Action Units. This distinction is important because the facial movements are anatomically-based. Also the name of the muscle is not "elevator", but "Levator palpebrae superioris".

Line 499-503: I was wondering till this paragraph why the authors ignored the nose area for horses. I assumed it simply did not present any movement during pain/sedation. But here the authors actually mention an important cue for sedation on the nose. I think it is important to discuss why the nose is not part of the scale. (there's good evidence to argue that this might be due to human biases and the way people process facial expressions in general, where noses do not attract attention, both in people and other animals, and so pain/sedation cues here might probably be missed. See for example: https://doi.org/10.1007/s10071-020-01348-5 and https://doi.org/10.1016/j.anbehav.2009.11.003). I think this point should be discussed regarding the nose or other facial regions not included in the scale, but also as a general limitation of these types of scales. I understand the authors looked at different instruments published and combined them, but even so, if this is not done in a systematic way, there will always be human biases (both from hardwired facial processing mechanisms, i.e. we don't tend to look at noses so much, and from expert/confirmatory bias, i.e. experts assume the important cues are in certain regions and do not attend to other regions).

Line 509-511: This is a very good point, but I would add that other than being extremely hard to capture the full expression (I am assuming here full means maximum intensity or with all facial regions acting/relaxing), it is up to the photographer to determine the moments of capture. Furthermore, measuring movement in photographs is very difficult, as it doesn't account for individual differences. See for example: https://doi.org/10.3389/fvets.2019.00154.

Line 514: Can the authors expand on an example of this if already exists in horses, or in other animals if published? The reference provided is not appropriate as it is not a computational approach (the landmarks are manually defined in each picture). A better example of what the authors are trying to say is perhaps this: https://doi.org/10.1016/j.conb.2019.10.008 (DeepLabCut Model Zoo has a module for primate faces with 45 markers that track movement).

Does FaceSed need training or specific backgrounds (e.g. only anaesthesiologists, only people very familiar with horses) to be used or anyone can use it? Please add that to your final remarks.

The ears pictures all have zero above them.

Reviewer #2: dear authors,

you have performed a very nice study that will add to the knowledge on objective assessment of quality of sedation in horses. The reviewer only has some minor issues that should be addressed, among them there are some questions about the statistics you have used and about the way criteria validation has been performed:

abstract:

line 23: "measuring" instead of "measured"

line 25: "performed" instead of "perfomed"

line 26: instead of "associated" you could better use the word "combined"

line 32: you state that intra- and inter observer reliability were good to very good, but the reader does not know at this stage what the values 0.74-0.94 are? (ICC, Crohnbach's alpha??)

line 36: you state in this line that the scale was unidimensional according to principal component analysis, why? maybe some explanation over here?

line 38: you state in this line that intem-total correlation was adequate, although te range of values is 0.3-0.73. It is not clear again what values these are (ICC?), but the lower range of 0.3 does not seem to be high enough for adequate?

introduction:

line 49: you mention standing horses, what about the role of sedative drugs in premedication for general anaesthesia?

line 64: instead of "sedation" you could maybe better use "depth and quality of sedation"?

materials and methods:

line 80-81 and line 95: what do you mean with "this study was opportunistic of another study"?

line 90: instead of "has begun" you could better use "began" or "would begin"

line 96-101: the reviewer thinks you should mention over here what the calculated sample size was?

phase I:

line 107: instead of "data" you should use the word "parameter"

line 113-114: what do you mean with "in situ registration"?

line 116: how was the treatment randomized?

line 117: it is not clear what the website you mention over here should tell the reader?

line 118: "Those" probably reflects to the treatment, but this is not grammatically sound

Selection of photos and evaluations

line 151: "that" instead of "who"

line 151: "plasma" instead of "plasmatic"

line 151-152: what do you mean with "corroborated by the degree of sedation recorded in situ"??

line 161: isn't 120 minutes after ACP administration a bit early for the higher dosage of ACP to determine end of tranquilisation?

line 162: "did not communicate with each other" instead of "between them"

line 174-176: why did you choose this 80% as the correlation that was needed to start the main evaluations?

line 183-184: you mention over here that the observers were asked to score the NRS first and after that the FaceSed. Why did you choose for this order? The main research question was about FaceSed and not about the NRS scoring, right? Might this previous NRS scoring have influenced the following FaceSed scoring? this could be a source of bias that is incorporated in the study design.

It would have ben much better if the NRS scores and FaceSed scores would have been disconnected from each other. Another problem is that you analyze the correlation between the NRS and the Face Sed, but because these were scored immediately after another, these observations were not independent and the correlation is therefore not valid.

development of the Facesed scale

line 193: "experience in sedation" instead of "experience of sedation

Statistical analyses

line 204: "...who attributed to the importance...."

line 210: what do you mean with "the following analysis"?

line 211: what do you mean with in-situ?

line 230: for concurrent criteria validation, the reviewer thinks that it is not the most logical choice to compare the FaceSed with the NRS, since this scale also holds some subjectivity and since these were assessed and taken immediately after another, they aare not independent. Therefore, it would be better to compare FaceSed only with the HHAG%, since this is a completely objective and independent parameter.

line 241-242: this sentence is not grammatically sound, what do you want to explain with this sentence?

line 243: I think you would need to explain what the eigen values and variance mean. These are not necessarily parameters that all readers are familiair with.

line 255: the range of acceptable values for the item-total correlation of 0.3 to 0.7 seems to have a very low lower margin for acceptable Spearman correlation? A correlation of 0.3 would seem to be very low according and not acceptable to the reviewer?

line 268: I would change the text into "...the frequency of sedation scores that were assigned by the evaluators...."

line 272: you could add "statistical significance was accepted at P<0.05?

Results

line 295: the range of repeatability of the FaceSed sum of 0.74-0.94 seems to be the combined range of all 4 observers taken together. This should be mentioned like this in this line.

line 298: instead of "steps" it would be better to use the term "observations"

line 372: like mentioned earlier, a correlation of 0.3 does not seems to be acceptable according to the reviewer.

line 373: in this line you state that all parameters showed an item-total correlation that was acceptable (although I would doubt this with a correlation of 0.3), except for the item eyes. What was the item-total correlation for this parameter?

line 378-379: The Crohnbach's alpha was used as a measure of internal consistency. With what was the FaceSed score compared to determine this value?

Discussion:

line 409: "low degree of sedation" instead of "low sedation"

line 427-428: could you maybe discuss this difference , for instance due to the technically more difficult task of observing ridden horses?

line 432: "the biases that could affect.."

line 436-437: you are right that scoring of the FaceSed did not bias scoring of the NRS, but the other way around, there might be a reason for bias. The fact that the NRS was first scored and immediately after that the FaceSed, might have influenced the reproducibility and repeatability of FaceSed and the correlation between FaceSed and NRS. This is what you mention in line 446, however, this high correlation might be due to how the scoring was performed.

line 448: this is not how you describe it earlier in your materials and methods: there, you say that for criterion validity, the FaceSed is compared to the NRS and the HHAG%.

line 461: instead of "....tranquilisation and low and high sedation.." you could better formulate "...tranquilisation with ACP from low and high sedation intensities with detomidine (with or without methadone)". Maybe could you also hypothesize about a possible reason for this?

line 484: in this paragraph you discuss sensitivity and specificity. Was it possible to determine vcut-off values that optimally discriminate between non-sedated and sedated horses?

line 494-497: you describe the difference betwen orbital closure due to being relaxed compared to due to being in pain. This lack of discriminatory power on this parameter could also lead to a false positive score, could you underline this better maybe?

line 513: instead of "of", you could better use "..tranquilisation from..."

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 1;16(6):e0251909. doi: 10.1371/journal.pone.0251909.r002

Author response to Decision Letter 0


22 Mar 2021

PONE-D-20-32183

Development and validation of the facial scale (FaceSed) to evaluate sedation in horses

PLOS ONE

Dear Dr. Luna,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 15 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

• A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

• A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

• An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Chang-Qing Gao

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere:

Answer: 'The only data that have been previously published are the measurement of head height above the ground (HHAG) collected on site. However, in the original study(Gozalo-Marcilla et al. Sedative and antinociceptive effects of different detomidine constant rate infusions, with or without methadone in standing horses. Equine Vet J. 2018;51: 530–536. doi:10.1111/evj.13054), these data are expressed in centimeters and in the present study in percentages. This information has been included in the methods section. The same data (HHAG%) have been published in another article about a behavioural sedation scale (EquiSed) (Frontiers in Veterinary Medicinedoi: 10.3389/fvets.2021.611729).

To our knowledge the inclusion of HHAG data in this manuscript does not imply in dual publication because the data were used only for comparison against the proposed scale.

Please clarify whether this publication was peer-reviewed and formally published.

Answer: yes, both publications have been peer reviewed. One has been formally published (Gozalo-Marcilla et al. Sedative and antinociceptive effects of different detomidine constant rate infusions, with or without methadone in standing horses. Equine Vet J. 2018;51: 530–536. doi:10.1111/evj.13054) and the other is recently published (Oliveira et al. Development, Validation, and Reliability of a Sedation Scale in Horses (EquiSed). Frontiers in Veterinary Medicinedoi: 10.3389/fvets.2021.611729)

If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

Answer: To our knowledge the specific data of HHAG are not considered dual publication because they are used only for comparison with the Facial scale. This is the only objective and standard method to identify sedation in horses that can be used to compare with a new sedation method being developed (FaceSed) (line 497– Table 4).

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

[Note: HTML markup is below. Please do not edit.]

Answer: Included line 762

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Answer: Dear Reviewer, we hope to have satisfied this item after this revision. If not, please let us know what further requirements are necessary.

Reviewer #2: Yes

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Interesting study done as part of/in tandem with another study on anaesthesia, where the authors develop and validate a scale for sedation levels in horses through facial cues. The work follows previous publications on the Horse Grimace Scale and combines this with a couple other works to build the FaceSed. The work done to validate the scale is comprehensive, but I think several clarifications and more information are needed at this point. I also think the authors need to better justify the need for this scale as opposed to just using the HGS and invert the scores (in which this study could simply present an application of HGS for sedation contexts).

Answer: Dear Reviewer

The authors appreciate your time and effort spent reviewing this manuscript, and thank you very much for your comments. All corrections have been performed according to the Reviewers' suggestions, and each comment responded to separately. Changes are highlighted according to the Journal's recommendations. We hope that after these corrections, you consider the manuscript suitable for publication, but we are happy to answer any further questions.

Please find below the justification regarding the need for this scale as opposed to simply using the HGS and inverting the scores.

Line 16: Sorry, maybe I'm missing something, but I don't understand the author contributions marked with "&". There is only one author marked with "&", so contributed equally to whom?

Answer: Excluded.

Line 51: I would suggest adding a short paragraph on the different sedation agents usually used here for horses and why, and then in the methods justify based on this why the authors chose the sedation agents for this study. (I am expecting these to be for example the most common ones).

Answer: Included – introduction (lines 52 - 58) and methods (lines 171 -175)

Line 53-57: Please expand on each of these scales, including what they are, how they measure sedation, differences between them, and why, in the author's opinion they are more or less objective. Also, mention briefly if they were used after development (cite all studies if not many were done). This is the state of the art of this introduction, so needs more information as not everyone might be familiar with all the scales published till date. Furthermore, this will set the scene to then explain why the current new method is better than all of these past ones.

Answer: The scales have been described in detail along with the differences between them (lines 59-83).

Line 54: I guess this will be replied once there is more information about each scale, but is HHAG a scale or a one item measurement? The references are for studies, I assume, applying this measure. But the reference here should be only the publication that developed the scale. So if I want to use this scale, which publication was the pioneer in using this measurement? The subsequent studies can then be cited in applications of this HHAG tool (and the same goes for the other scales).

Answer: More information regarding the scales has been included (lines 59 – 83) as well as the pioneer references. HHAG is only one item measurement and this has been clarified.

Line 58-62: why were these scales deemed convenient? In what way? Also, please add which scales are used for clinical or experimental purposes. The name of each scale, whenever mentioned, should be in the text, so the reader doesn't need to go back to the references everytime.

Answer: A more detailed description has been incorporated, including the limitations (lines 84 – 94).

Line 61-62: Do the authors mean here that in order to use some scales, the handler needs to administer certain visual or auditory stimuli to the anaesthetised animal? If so, please clarify this idea, as it seems a bit vague at the moment and give examples of what kind of visual/auditory stimuli are part of the scales. Also, if I understood this correctly, there is another potential problem other than the ones cited by the author already, as delivery of these stimuli (e.g. intensity, quality) will probably vary a lot with the handler, which can create a confounding variable. For example, a lower intensity stimuli, means a lower response and assumed higher sedation, when this might not be true.

Answer: Thanks for pointing that out. We believe the description of the stimuli performed in other studies has now been clarified along with the different scales used (lines 84 – 94). The variation in the delivery of these stimuli according to the handler has also been included as a limitation of studies assessing sedation scales (lines 92-94).

Line 63: The content related to facial expressions within each of the scales mentioned above needs to be expanded. Please include, which scales contain facial expression attributes, and give examples for each scale of these facial attributes.

Answer: Included lines 95 – 106.

Line 63-64: the second part of this sentence is a bit confusing for me. These scales were developed for the sole purpose of identifying sedation, so how can they not be relevant? I am guessing by the next sentence, the authors mean validation, consistency, reliability, etc., so perhaps the word relevance is better replaced by something else, like psychometric attributes or something similar? The authors vaguely pass the main important idea here, which is that anyone can develop any scale, but the quality of it as a tool for measuring a construct has to be assessed after development. This needs to be made clearer.

Answer: Thanks for pointing that out. The whole paragraph has been amended and details added (lines 95 - 106 and 121 – 122).

Line 63: I know the authors intend to develop a tool, but more background on the tool development is needed. Perhaps around here, the authors need to explain why the face and facial expressions are relevant for 1) pain and thus 2) evaluating sedation, but also 3) for horses. This needs to be argued well, since some scales include facial attributes and others not. There is also the issue of horses being a prey species and hence, facial signs of pain might not be present (adaptively, they would not be present). What is so special about horse faces (and faces in general) that makes them good indicators for sedation scales? I would suggest here maybe mention some human literature on facial expressions, and/or what we know about horse making use of their facial expressions in general (there are some studies on this). This can then be linked with for example, the HGS that is already mentioned on line 66.

Answer: Background about facial expression and its relevance in horses has been included, lines 95 -106.

Line 66: It is great to have these examples, but please mention which ones are being cited. Also, while the examples are important, the reason why a scale needs validation is not because others validated their scales before. So please add why it is important to validate a scale.

Answer: The scales have been mentioned, lines 109 – 110 and the reason behind scale validation has been included, lines 107-109.

Line 69-70: why are they important? Please expand.

Answer: Included lines 113-120.

Line 74: what criteria? please mention them again. As these are the goals of this study, it has to be clear what the authors propose on doing. The prediction is not really a prediction at the moment. It needs to actually predict what the authors think it's going to measure in the different phases of the experiment, and be justified as well based either on lit or logical statements (if lit is not available). Also, the "why" of this study needs to be added to the goal.

Answer: The criteria and predictions have been included in lines 121 – 132. The reason why the study was performed has been described in lines 126 – 139.

Line 93: Was phase I and phase II done in different days? The same individual was thus subjected to all the treatments, i.e. detomidine, detomidine+methadone, acepromazine, in different days, correct? Please make this clear here.

Answer: Phases I and II were performed on different days with the same horses, eleven months apart. This information has been included, lines 160 – 164.

Also, make it clear that what the current manuscript is reporting, is only phase II, and phase I is described only for understanding of phase II. This is important in order to not plagiarise the work published in phase I.

Answer: This manuscript is reporting Phases I and II and the only data that have been replicated in this manuscript in common with the previous study are values of height of the head above the ground. Included, lines 179 – 181.

Line 110: how was the baseline decided?

Answer: The baseline was the measurement of the head height above the ground when the horse was unsedated, before the administration of the treatments. This has been included, lines 184 – 187.

Line 113: What does in situ registration means?

Answer: Excluded because it is not relevant (in situ = at the experimental moment).

Line 117: I don't think the links for this and the power sample above need to be in text, they can be transformed as a reference - maybe check PLOS ONE referencing guide or a style book for proper in-text citation of websites.

Answer: The link has been Included as a reference.

Line 128-131: I'm assuming this was done in phase I but is part of the study reported in this manuscript, correct? This needs to be made clearer. The authors can simply add something like "For the previous study...", "For this study...". Again, it is important to understand what belongs to the study here reported and the study previously published.

Answer: We believe this is now better explained, line 206 and line 230. Please let us know if further clarification is required.

Line 130: Why these time points? Is there previous evidence in the literature indicating these time points as ideal sampling?

Answer: These time points were selected according to previous studies with similar methodology and pharmacokinetic data. References have been included in lines 209 - 211.

Line 134: I was surprised here to see that phase I and II had almost a year in between. This information should come before phase I description. Why was there such a long time interval? This means that all the horses were a year older (or just about).

Answer: This information has been included before Phase I, line 161. We decided to include the acepromazine groups after Phase I to assess the validity of the FaceSed not only for sedation but also for tranquilization. Unfortunately, the facilities, horses, and authors were not available to perform the study earlier. This has been included in the discussion (line 591 - 595).

Line 137: Please include what was exactly measured here. Was there a baseline point, HHAG, sedation evaluations and pictures taken in 6 time points, from 5 till 120min? Why here there were less time points?

Answer: More details have been included. The horses were only maintained for 120 minutes because during this phase there was no other concomitant study, therefore it was not necessary to maintain the horses in the stall for longer than 120 minutes as in phase I. Lines 221 – 225.

Line 143: I am a bit confused here, what is the main study here referring to? The previously published one? I would suggest the authors find a way of better distinguishing between these two studies (e.g. give it a name) and then refer to it consistently and clearly define what was in one study and what is in the current study. As a reviewer, I can't really comment on a previous published study in the same way I should comment on the present work, but there are several moments I am not sure which one is which. I think it's great to take advantage of previous studies to collect data, but the reporting of these studies should be carefully done. As I mentioned before, there are auto-plagiarism concerns when publishing twice the same work, and also it can give the impression this is somehow going towards the practice of salami slicing (see for example: https://doi.org/10.1038/nmat1305 or https://doi.org/10.1007/s10459-019-09876-7). Both in the introduction (goals) and methods, throughout the text, it needs to be perfectly clear what was the goal of the previous published study, what is the goal of this study, how do they differ, and why are they being published in separate articles.

Answer: Corrections have been performed throughout the manuscript to differentiate data collected from the previous and present studies. We had to split these studies because they had completely different objectives and the amount of data would be excessive for only one publication (line 144-149). Thanks for addressing these points, your comments were very useful to improve the clarity and quality of the present manuscript.

Line 152: I am unsure what reference 28 is supposed to mean here, since it refers to a conference presentation...? First, if the reference is correct, something like "personal communication" should be added, since this is not a peer-reviewed published publication and it is not accessible. In fact, according to PLOS ONE guidelines, this should not even be cited:

Do not cite the following sources in the reference list:

• Personal communications (these should be supported by a letter from the relevant authors but not included in the reference list)

If there are more cases of conference presentations, please correct them.

Answer: We are sorry for referencing a conference abstract. This has been removed and replaced with the recently published article

Second, the "degree of sedation" evaluation criteria needs to be explained (either here or in supplementary materials).

Answer: Included, lines 238– 240.

Line 155: Why wasn't the end of sedation also determined through plasmatic concentration?

Answer: The end of sedation time-point was determined based on low sedation scores and on the residual plasma concentration according to the previous pharmacokinetic study performed simultaneously. This information has been included, line 244.

Line 156-161: The concentration of acepromazine was not measured, correct? If so, the authors shouldn't mention "reduction in concentration", since this was not measured.

Answer: The authors agree that ”reduction in concentration” was misused. Corrected, line 249-250.

I am also a bit unsure about the way the justification for the time points is being presented. Obviously, the time points were chosen a priori since they are so uniform (i.e. every half hour), so unless there are studies showing that this is indeed such a regular action of the sedation agents, this needs to be corrected. It is important to either explain this differently or correct the justification for each time point, as at the moment it incurs in circular reasoning, i.e. the authors chose 60m, 90m, 120m a priori, then make the degree of sedation scores fit these points and use it to justify why these points were chosen. In order to correct this, either simply assume these time points were chosen for convenience or clearly explain how the degree of sedation scores were calculated (it is ok to go in supplem. materials if needed) and how much they had to increase/decrease in each point. And particularly, carefully justify why it just so happens to be such convenient time points for peak and so on.

Answer: The reasons behind the selected time points have been better described in lines 245 – 255.

Line 164: why is it relevant the 3 other coders are from different institutions and hold this diploma for scoring of facial expressions? All evaluators are authors of this manuscript, so is it right to assume all evaluators were aware of the goal of the coding? Also, was any of the evaluators present on site during the sedation experiments? If so, please add this information.

Answer: Independent blind evaluators are relevant for validation studies in order to minimize expectation bias. The main author was present during the experimental work and was responsible for selecting the photos. We have included the information that the observers were Diplomate in Veterinary Anaesthesia only to inform readers that the observers have experience in the area, therefore the scale must be tested in the future with inexperienced observers to check its reliability. This information has been included in the manuscript (methods: Lines 261-266 and discussion: lines 722 - 727).

Line 170: Which author randomised and selected the photos? If one of the evaluators also did this, then please clearly state that this evaluator was not blind to the treatments. If not, please add who did this step.

Answer: Included, lines 263 and 269 - 270.

Line 175: How many rounds of coding were needed to obtain 80%? Was this ICC test? If not, which correlation type was used? And what was the intra-observer reliability for the training? (I'm assuming that the evaluations done one month apart were for this - if not, please explain why that was done). Was the intra-observer taken into account before moving to the test coding?

Answer: The evaluators scored the photos only twice, with an interval of one month between viewings (273-278). The intra- and inter-observer reliability was 80% (Spearman correlation). The intra-observer reliability is the comparison between the first and second evaluation for each observer and the inter-observer reliability is the matrix correlation comparison among observers at the second evaluation. Spearman correlation analysis was performed and included (line 273-276), both intra and inter observer reliability were over 0.8.

Line 178-180: This is really not relevant information. What we need to know is what information was given to the evaluators with each photo pair (if any).

Answer: This information has been excluded, and guidelines are available as supplementary information (Guidelines), lines 282 to 285.

Line 181: Can these instructions be added as supplementary materials please?

Answer: Supplementary information has been included and attached to the manuscript, line 282.

Line 183: Is the NRS used here published anywhere? If so, please provide references. If not, please add the scale in supplem. materials. In any case, briefly describe what this scale is and how is it applied. Were all the evaluators very familiar with this scale?

Answer: The NRS used here was published and used for pain assessment – reference included in line 294. It is a simple and intuitive scale. The evaluators were instructed in the guidelines on how to use the scale (supplementary material) before starting the analysis of the photos. This has been included in lines 282 – 285.

Line 188: So the FaceSed is based on the HGS? Please make this clear (instead of vaguely indicating a reference that makes the reader go down to the reference list to understand what is being said), as it is an important methodological step. Also make it clear that eye tightening, strained mouth and stiff ears are indicators of pain in the HGS. Finally, please add more specific descriptions of what these terms mean exactly (according to the HGS).

Answer: A detailed description has been included in lines 288 – 309.

Line 189: When I read this, the first though in my mind was: why do we need the FaceSed then, since we can just invert the scores from the HGS? Maybe rephrase this sentence or explain better how is it any different from the HGS. If FaceSed is basically scoring the relaxation of muscles from an apex point to a neutral point, then we don't really need it, as muscles do not contract and relax in different ways. How does FaceSed differs from an inverted HGS? Is it because it is combined with EquiFACS and other scales? This is a crucial point to defend in this manuscript, because it might be pointless to create yet another scale, if we could just use the HGS. (if this was the case, then the manuscript would then have to be reframed entirely in order to make clear that this is a validation/adaption of the HGS for a sedation situation and not a creation of a totally new scale).

Answer: Although the FaceSed was based on the main facial characteristics described in the HGS as a starting point, it is not possible only to invert the scores of the HGS, because the FaceSed was developed with specific descriptors characteristic of sedation in the horses of the present study (this has been included in lines 684 – 687 of Discussion section). The authors considered the reviewer's suggestions and have rewritten the description of the scale development (lines 288- 309), including the differences between the FaceSed and HGS.

Only the orbital aperture was coincident in both scales. The basic difference between the scales is that, except for orbital aperture, the neutral point is the only coincident score between both scales. Pain produces contraction and sedation produces relaxation, therefore simply inverting the score does not provide the same facial expression. This has been better explained in lines 684 – 694.

Line 190-192: I don't quite understand this part of the sentence... photos from the other studies were used to describe sedation scores how? Also, how did untreated pictures of horses contribute to describe sedation scores? This needs to be rephrased as it is a bit confusing and vague. Also, the study cited after EquiFACS is not a study on untreated horses per se, but it is instead a publication detailing the development of a FACS tool to measure muscle-based facial movements in horses (in any situation). Furthermore, the studies cited in 20,21,30, are development of scales, so please name each scale from where the information was extracted. This should be corrected.

Answer: Thanks for your contribution to the Method description. The methodology of the development of the scale has been completely rephrased. We hope now that it is more comprehensive, as we used other studies to decide which facial action units would be representative to evaluate sedation from the state of normality (lines 684 – 694).

Answer: We only used EquiFACS to identify possible indicators of movement of the facial musculature which would resemble easily identifiable facial units described in other studies of horses under pain and sedation. We did not use or apply EquiFACS in our assessments because none of the authors is EquiFACS certified. I hope this issue is now better described– lines 300-303.

Line 195: How was the relevance attributed? Was it purely based on the subjective opinion of each evaluator, i.e. educated guesses? If so, please state this clearly, if there were basic criteria, please state them instead. If this was purely based on the subjective opinion of each evaluator, it would be interesting to disclose the reasoning for including/excluding items, or at least examples. heavily relying on "expert opinion" to develop an objective tool is always problematic since it often incurs in confirmatory biases. For example, veterinary anaesthesiologists might have certain impressions/ideas about what happens in the horse's face during anaesthesia, these ideas are then used to decide how to evaluate which items matter for evaluating horse's faces during anaesthesia, and the same ideas are then used to build an "objective" scale. This is particularly important as facial expressions are very subtle and hard to detect by people not trained to detect these facial behaviours. This is even harder to do in pictures, as there's considerably less information to decide if a movement is present of not.

Answer: The content validation description was the first item of statistical analysis. It has been moved to before the Statistical analysis and included as supplementary information. Content validation was achieved in three steps explained in the manuscript. Although an “expert” committee is a traditional way to assess content validity, we included two other steps to minimize subjective opinion (Streiner DL, Norman GR, Cairney J. Health measurement scales: A practical guide to their development and use. 5th ed. New York: Oxford University Press; 2015)– lines 310 – 320.

Line 205: Right, so this should have been described above (see my previous comment).

Answer: Included before statistical analysis as suggested.

Line 226: Please briefly explain what concurrent criteria validation is. The same for construct validity and the other validation measures.

Answer: More detailed information has been included. The most common way to test concurrent criteria validation is to correlate the proposal instrument with a gold-standard one. The construct validity or responsiveness consists of changes over time indicative of the presence and absence of the target phenomenon, in our case sedation (lines 339 - 343 and 351 - 353).

Line 271: Please add versions of each software.

Answer: Included in lines 421 – 422. R-Studio Team – 2016.

Line 427-430: Please explain why there is this discrepancy.

Answer: The reason behind the discrepancy between the studies has been included in lines 601 – 603.

Line 440: But these were not apparent in the pictures, so how can it have influenced the evaluators?

Answer: Sweating and low HHAG might be apparent in some pictures. This has been included in the discussion (line 615) and also as a limitation of the study (lines 725 - 727), with information that evaluators were blind to treatments but, because of this limitation, possibly not to the moments.

Line 444-446: But this will assume the first scale is of high quality in all measurement components, plus it assumes there is already an existing scale. So I am not sure this is a "need" per se, instead it probably is more a term of comparison with previous work.

Answer: The paragraph has been rephrased (lines 625 – 630).

Lines 458-460: I didn't really understand what the authors are trying to say here, please rephrase.

Answer: The paragraph has been rephrased (lines 632- 639).

Line 467: In other words, all items correlate with each other? Please explain what this means in practice.

Answer: The practical implication has been included (lines 656-658).

Line 474: Does insufflate here means increase?

Answer: The sentence has been slightly modified to add clarity about the meaning of this value to; ‘showing that the orbital opening (0.73) may be a restatement of other FaceSed items’ (line 671 - 674).

One thing I am not clear is if all the analysis undertaken here for the FaceSed assume the item for eyes is independent from the mouth or does assume they all are dependent? Maybe this could be discussed around line 478.

Answer: a more detailed explanation about item-total correlation has been provided (lines 666 – 671).

Line 494: FACS systems do not describe "characteristics", they describe Action Units. This distinction is important because the facial movements are anatomically-based. Also the name of the muscle is not "elevator", but "Levator palpebrae superioris".

Answer: Corrected to Facial Action Units throughout the manuscript where applicable.

Line 499-503: I was wondering till this paragraph why the authors ignored the nose area for horses. I assumed it simply did not present any movement during pain/sedation. But here the authors actually mention an important cue for sedation on the nose. I think it is important to discuss why the nose is not part of the scale. (there's good evidence to argue that this might be due to human biases and the way people process facial expressions in general, where noses do not attract attention, both in people and other animals, and so pain/sedation cues here might probably be missed. See for example: https://doi.org/10.1007/s10071-020-01348-5 and https://doi.org/10.1016/j.anbehav.2009.11.003). I think this point should be discussed regarding the nose or other facial regions not included in the scale, but also as a general limitation of these types of scales. I understand the authors looked at different instruments published and combined them, but even so, if this is not done in a systematic way, there will always be human biases (both from hardwired facial processing mechanisms, i.e. we don't tend to look at noses so much, and from expert/confirmatory bias, i.e. experts assume the important cues are in certain regions and do not attend to other regions).

Answer: Thank you very much for pointing this out and for the suggestions on the literature. This paragraph has been amended, lines 715 – 719.

Line 509-511: This is a very good point, but I would add that other than being extremely hard to capture the full expression (I am assuming here full means maximum intensity or with all facial regions acting/relaxing), it is up to the photographer to determine the moments of capture. Furthermore, measuring movement in photographs is very difficult, as it doesn't account for individual differences. See for example: https://doi.org/10.3389/fvets.2019.00154.

Answer: Thank you again for your great suggestion which has been incorporated, lines 736 – 739.

Line 514: Can the authors expand on an example of this if already exists in horses, or in other animals if published? The reference provided is not appropriate as it is not a computational approach (the landmarks are manually defined in each picture). A better example of what the authors are trying to say is perhaps this: https://doi.org/10.1016/j.conb.2019.10.008 (DeepLabCut Model Zoo has a module for primate faces with 45 markers that track movement).

Answer: Thanks for suggesting the reference, which has been included (lines 745 – 746).

Does FaceSed need training or specific backgrounds (e.g. only anaesthesiologists, only people very familiar with horses) to be used or anyone can use it? Please add that to your final remarks.

Answer: Included in lines 748 – 751.

The ears pictures all have zero above them.

Answer: Corrected

Answer: We really appreciate your comments.

Reviewer #2: dear authors,

you have performed a very nice study that will add to the knowledge on objective assessment of quality of sedation in horses. The reviewer only has some minor issues that should be addressed, among them there are some questions about the statistics you have used and about the way criteria validation has been performed:

Answer: Dear Reviewer

The authors appreciate your time and effort spent reviewing this manuscript, and thank you very much for your comments. All corrections have been performed according to the Reviewers' suggestions, and each comment responded to separately. Changes are highlighted according to the Journal's recommendations. We hope that after these corrections, you consider the manuscript suitable for publication, but we are happy to answer any further questions.

abstract:

line 23: "measuring" instead of "measured"

Answer: Corrected

line 25: "performed" instead of "perfomed"

Answer: Corrected

line 26: instead of "associated" you could better use the word "combined"

Answer: Changed to combined

line 32: you state that intra- and inter observer reliability were good to very good, but the reader does not know at this stage what the values 0.74-0.94 are? (ICC, Crohnbach's alpha??)

Answer: These analyses are ICC, thank you for your observation. The ICC has been included, line 32.

line 36: you state in this line that the scale was unidimensional according to principal component analysis, why? maybe some explanation over here?

Answer: The explanation has been included that all had load factors above 0.5 at the first dimension (line 38).

line 38: you state in this line that intem-total correlation was adequate, although the range of values is 0.3-0.73. It is not clear again what values these are (ICC?), but the lower range of 0.3 does not seem to be high enough for adequate?

Answer: The Spearman correlation has been included and so has the measurement in rho (line 40). This range follows the reference for item total correlation analysis that identify how much each item is correlated to the total score of the scale (Strainer et al., 2015). A detailed description about adequacy of this range is given in the discussion, lines 668-672.

introduction:

line 49: you mention standing horses, what about the role of sedative drugs in premedication for general anaesthesia?

Answer: This has been included in the 2nd sentence of the first paragraph of the introduction (lines 54-55).

line 64: instead of "sedation" you could maybe better use "depth and quality of sedation"?

Answer: The paragraph has been rephrased according to the suggestion of Reviewer 1, and depth and quality of sedation has been included throughout the manuscript where applicable.

materials and methods:

line 80-81 and line 95: what do you mean with "this study was opportunistic of another study"?

Answer: The word opportunistic has been replaced by other terms. The data of this study were collected during other parallel studies performed simultaneously and this has now been better explained in the manuscript (lines 126-129).

line 90: instead of "has begun" you could better use "began" or "would begin"

Answer: Changed to began, line 157.

line 96-101: the reviewer thinks you should mention over here what the calculated sample size was?

Answer: The number of horses for the sample size calculation has been included. Line 165.

phase I:

line 107: instead of "data" you should use the word "parameter"

Answer: Changed to parameter, line 180.

line 113-114: what do you mean with "in situ registration"?

Answer: The sentence has been rephrased and in situ is no longer used. In situ has been replaced with on site throughout the manuscript

line 116: how was the treatment randomized?

Answer: Included in lines 193 – 195.

line 117: it is not clear what the website you mention over here should tell the reader?

Answer: The website has now been included as a reference according to the Journals instructions. It was used as a tool to randomize the treatments. Line 194.

line 118: "Those" probably reflects to the treatment, but this is not grammatically sound

Answer: “Those” has been changed to “the treatments” line 193.

Selection of photos and evaluations

line 151: "that" instead of "who"

Answer: Changed to “that”, line 238.

line 151: "plasma" instead of "plasmatic"

Answer: Changed to plasma, line 238.

line 151-152: what do you mean with "corroborated by the degree of sedation recorded in situ"??

Answer: The sentence has been rephrased to explain how and where the degree of sedation was evaluated. In situ has been changed to on site throughout the study (line 238 - 240).

line 161: isn't 120 minutes after ACP administration a bit early for the higher dosage of ACP to determine end of tranquilisation?

Answer: The final time point of 120 min was chosen for convenience, to avoid the restriction of the horses in the stall for longer periods. This has been included in lines 253. The experimental time-points chosen for Phase II have been further explained in lines 251- 252.

line 162: "did not communicate with each other" instead of "between them"

Answer: Included, line 256.

line 174-176: why did you choose this 80% as the correlation that was needed to start the main evaluations?

Answer: The Spearman correlation was performed before starting the main evaluations to guarantee that the evaluators were deeply involved in their observations and were, therefore, reliable. This has been included in lines 273 – 276. According to Streiner, 2015, the Interpretation of Spearman's correlation is 0 - 0.35 - low correlation; 0.35 - 0.7 - medium correlation; 0.7 - 1.0 - high correlation. We considered 80% a good starting point for the main evaluations. The reason why this correlation was chosen and the reference have been included in line 273 - 277.

line 183-184: you mention over here that the observers were asked to score the NRS first and after that the FaceSed. Why did you choose for this order? The main research question was about FaceSed and not about the NRS scoring, right? Might this previous NRS scoring have influenced the following FaceSed scoring? this could be a source of bias that is incorporated in the study design.

It would have ben much better if the NRS scores and FaceSed scores would have been disconnected from each other. Another problem is that you analyze the correlation between the NRS and the Face Sed, but because these were scored immediately after another, these observations were not independent and the correlation is therefore not valid.

Answer: Although the authors agree that it would be better to assess NRS disconnected from FaceSed, it is part of the validation process (specifically the concurrent criterion validation) to compare the proposed scale (FaceSed) against another one that is used for the same purpose (NRS) and at the same time. According to the literature mentioned in the manuscript this should be performed simultaneously (or sequentially). Considering the amount of data, it would not be feasible to perform one separate analysis for each scale. Other studies that validated pain scales used similar methodologies when working with extensive data sets. This topic has been included in the discussion as a bias, to consider the Reviewer´s point (lines 616 - 624).

line 193: "experience in sedation" instead of "experience of sedation

Answer: This sentence has been rephrased.

Statistical analyses

line 204: "...who attributed to the importance...."

Answer: Included, line 316.

line 210: what do you mean with "the following analysis"?

Answer: Changed to analysis described below, line 322.

line 211: what do you mean with in-situ?

Answer: In-situ has been changed to on site throughout the manuscript.

line 230: for concurrent criteria validation, the reviewer thinks that it is not the most logical choice to compare the FaceSed with the NRS, since this scale also holds some subjectivity and since these were assessed and taken immediately after another, they aare not independent. Therefore, it would be better to compare FaceSed only with the HHAG%, since this is a completely objective and independent parameter.

Answer: The authors agree with the points raised by the reviewer about the NRS. This flaw has been included in the discussion as described above (lines 616 - 624). Both NRS and HHAG% were considered for concurrent criterion validity because as reported in lines 628 - 630, although HHAG% is a good method to assess depth of sedation, it is not very applicable for tranquilisation because asepromazine doses do not influence HHAG% in the same way as the alfa-2 agonists. Therefore, both HHAG% and NRS results are available for the reader. The paragraph about this topic has been rephrased and literature included to address the reviewer´s point and the reason behind the correlation between FaceSed and both instruments (NRS and HHAG%).

line 241-242: this sentence is not grammatically sound, what do you want to explain with this sentence?

Answer: The paragraph has been rephrased (lines 360 – 361).

line 243: I think you would need to explain what the eigen values and variance mean. These are not necessarily parameters that all readers are familiair with.

Answer: Explanation included in lines 363 – 366. “The eigenvalues and variance are coefficients extracted from the correlation matrix of the PCA that indicate the degree of contribution of each dimension, helping to select only the representative dimensions”.

line 255: the range of acceptable values for the item-total correlation of 0.3 to 0.7 seems to have a very low lower margin for acceptable Spearman correlation? A correlation of 0.3 would seem to be very low according and not acceptable to the reviewer?

Answer: The authors agree with the reviewer that correlations of 0.3 are low, but differently from other types of correlation calculation, the values of item-total correlation are interpreted differently. This range is supported by the validation procedures of the literature and the reason behind this range has now been better explained in the discussion (lines 668-672). Please let us know if this is now clear.

line 268: I would change the text into "...the frequency of sedation scores that were assigned by the evaluators...."

Answer: Changed (line 414).

line 272: you could add "statistical significance was accepted at P<0.05?

Answer: Included - line 422.

Results

line 295: the range of repeatability of the FaceSed sum of 0.74-0.94 seems to be the combined range of all 4 observers taken together. This should be mentioned like this in this line.

Answer: Included - line 445.

line 298: instead of "steps" it would be better to use the term "observations"

Answer: Corrected– line 450.

line 372: like mentioned earlier, a correlation of 0.3 does not seems to be acceptable according to the reviewer.

Answer: Please find the explanation above. This has been included in the discussion (lines 668 - 672).

line 373: in this line you state that all parameters showed an item-total correlation that was acceptable (although I would doubt this with a correlation of 0.3), except for the item eyes. What was the item-total correlation for this parameter?

Answer: Included between brackets (line 528) and the result is also presented in Table 6.

line 378-379: The Crohynbach's alpha was used as a measure of internal consistency. With what was the FaceSed score compared to determine this value?

Answer: The internal consistency assessed by Cronbach's alpha coefficient investigates whether the items of the scale are showing a consistent (similar) response. When a scale is developed, it is expected that the scores would correlate well mutually. Essentially, internal consistency represents the average of the correlations among the items of the scale. A more detailed explanation has been included in the methods (line 369) and discussion (lines 662 - 664).

Discussion:

line 409: "low degree of sedation" instead of "low sedation"

Answer: Corrected, line 575.

line 427-428: could you maybe discuss this difference, for instance due to the technically more difficult task of observing ridden horses?

Answer: Included, lines 601 – 603.

line 432: "the biases that could affect.."

Answer: Included, line 605.

line 436-437: you are right that scoring of the FaceSed did not bias scoring of the NRS, but the other way around, there might be a reason for bias. The fact that the NRS was first scored and immediately after that the FaceSed, might have influenced the reproducibility and repeatability of FaceSed and the correlation between FaceSed and NRS. This is what you mention in line 446, however, this high correlation might be due to how the scoring was performed.

Answer: Thanks for pointing that out. This bias has been included in lines 615-624 as well as the reason why they had to be simultaneously assessed for concurrent criterion validity (lines 622 - 624).

line 448: this is not how you describe it earlier in your materials and methods: there, you say that for criterion validity, the FaceSed is compared to the NRS and the HHAG%.

Answer: The paragraph has been rephrased. The HHAG% is considered the gold standard, however, both the HHAG% and the NRS were used for comparing to FaceSed as described in other studies. Lines 632-639.

line 461: instead of "...tranquilisation and low and high sedation.." you could better formulate "...tranquilisation with ACP from low and high sedation intensities with detomidine (with or without methadone)". Maybe could you also hypothesize about a possible reason for this?

Answer: Corrected as suggested and the possible reason justified in lines 650 - 652.

line 484: in this paragraph you discuss sensitivity and specificity. Was it possible to determine vcut-off values that optimally discriminate between non-sedated and sedated horses?

Answer: The cut-off value has been included in the statistical analysis, lines 397 – 411 and the results 546 – 556.

line 494-497: you describe the difference betwen orbital closure due to being relaxed compared to due to being in pain. This lack of discriminatory power on this parameter could also lead to a false positive score, could you underline this better maybe?

Answer: The sentence has been rephrased and the suggestion of the reviewer included, lines 704 – 708.

line 513: instead of "of", you could better use "..tranquilisation from..."

Answer: corrected line 743.

________________________________________

Attachment

Submitted filename: Response to reviewers.docx

Decision Letter 1

Chang-Qing Gao

19 Apr 2021

PONE-D-20-32183R1

Development and validation of the facial scale (FaceSed) to evaluate sedation in horses

PLOS ONE

Dear Dr. Luna,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 03 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Chang-Qing Gao

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I re-read this study with a lot of interest once again. I congratulate the authors in carefully responding to all the comments and making changes accordingly. I believe the manuscript is now much more informative and clearer. Although it is a long read, it is a good example of a study that has enough detail for replication and for fully understanding what has been done. I have just a couple of minor comments now, please see below. And also a small note about line numbering in the highlighted document with the changes. I know this is a painstakingly process to get right (I always struggle myself whenever doing it in my own papers), but most of the lines mentioned in the answer are not the same as the highlighted manuscript. This made me take a little while longer to find all the changes, so it is something to keep in mind on future reviews. In any case, well done!

---Minor comments:---

Line 43-45: Explain briefly why is this a limitation. Is it one limitation or two? As in, using pictures AND horses were docile? (This might be explained in the text, but I was a bit confused when reading it here only)

Line 75: Do these scales require experience with the scale and/or horses? So only certain expert observers can use them? Is there any assessment of quality for how the observers use these scales (other than inter-observer reliability)?

While the clarifications and the author answers were all fantastic, I still struggled to follow the protocols and methodologies, particularly with the different phases and what belongs to what study. What about having a flow diagram to make it clear and easy to understand what was done, when, and for what studies?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 1;16(6):e0251909. doi: 10.1371/journal.pone.0251909.r004

Author response to Decision Letter 1


28 Apr 2021

PONE-D-20-32183

Development and validation of the facial scale (Facesed) to evaluate sedation in horses

Reviewer #1: I re-read this study with a lot of interest once again. I congratulate the authors in carefully responding to all the comments and making changes accordingly. I believe the manuscript is now much more informative and clearer. Although it is a long read, it is a good example of a study that has enough detail for replication and for fully understanding what has been done. I have just a couple of minor comments now, please see below. And also a small note about line numbering in the highlighted document with the changes. I know this is a painstakingly process to get right (I always struggle myself whenever doing it in my own papers), but most of the lines mentioned in the answer are not the same as the highlighted manuscript. This made me take a little while longer to find all the changes, so it is something to keep in mind on future reviews. In any case, well done!

Answer:

Dear Reviewer

We appreciate your comments and thank you for your final suggestions.

Please find below each point answered separately.

Line 43-45: Explain briefly why is this a limitation. Is it one limitation or two? As in, using pictures AND horses were docile? (This might be explained in the text, but I was a bit confused when reading it here only)

Answer: included

Line 75: Do these scales require experience with the scale and/or horses? So only certain expert observers can use them? Is there any assessment of quality for how the observers use these scales (other than inter-observer reliability)?

Answer: These scales will require experience “with the effects of sedation in horses” (included). This point has also been addressed in the last sentence of the same paragraph “Unidimensional scales (VAS, NRS and SDS) may be biased to the interpretation and experience of the evaluator, generating differences in results with doubtful representativeness when comparing studies”.

While the clarifications and the author answers were all fantastic, I still struggled to follow the protocols and methodologies, particularly with the different phases and what belongs to what study. What about having a flow diagram to make it clear and easy to understand what was done, when, and for what studies?

Answer: a flow chart has been included as requested.

Attachment

Submitted filename: Response to reviewers April 24th 2021.pdf

Decision Letter 2

Chang-Qing Gao

6 May 2021

Development and validation of the facial scale (FaceSed) to evaluate sedation in horses

PONE-D-20-32183R2

Dear Dr. Luna,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Chang-Qing Gao

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thanks for addressing the few last comments. Great flow chart as well, which now makes it really clear what was done. I have no further comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Chang-Qing Gao

20 May 2021

PONE-D-20-32183R2

Development and validation of the facial scale (FaceSed) to evaluate sedation in horses.

Dear Dr. Luna:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Chang-Qing Gao

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Flowchart of the methodology of the data collection of a facial sedation scale in horses.

    (TIF)

    S2 Appendix. Guidelines for evaluation of a facial sedation scale in horses.

    (DOCX)

    S1 Data

    (XLSX)

    Attachment

    Submitted filename: Response to reviewers.docx

    Attachment

    Submitted filename: Response to reviewers April 24th 2021.pdf

    Data Availability Statement

    Data was included as a Supporting information file (Data.FaceSed).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES