Testing and characterization of herding dogs’ behaviors

Boris Lasserre; Barbara Ducreux; Marjorie Chassier; Louise Joly; Pascal Cacheux; Thierry Le Morzadec; Stéphanie Dayde-Fonda; Caroline Gilbert

doi:10.1093/jas/skae157

. 2024 Jun 21;102:skae157. doi: 10.1093/jas/skae157

Testing and characterization of herding dogs’ behaviors

Boris Lasserre ¹, Barbara Ducreux ^2,^✉, Marjorie Chassier ³, Louise Joly ⁴, Pascal Cacheux ⁵, Thierry Le Morzadec ⁶, Stéphanie Dayde-Fonda ⁷, Caroline Gilbert ⁸

PMCID: PMC11222981 PMID: 38902885

Abstract

Breeding for phenotype in herding dogs (HDs) mainly relies on their performance in national field trial competitions, which has been shown to be inadequate for identifying HDs suited for real livestock farming conditions. In this study, a different field trial with a new scoring system consisting of 28 items to consider was designed to assess young HDs, the results of which culminated in a statement of adequate phenotype (AP) or non-adequate phenotype (NAP). An AP HD was defined as being able: to control the direction of a flock, to keep it grouped close to a handler when needed, to confront animals it is dealing with in a respectful manner, and able to create movement of the flock without excessive disturbance, threatening or attacking it through chasing, or uncontrolled biting. This innovative trial is composed of a pre-test (PT) and a test (T) phase. To evaluate its efficiency in detecting AP/NAP, 460 French Border Collies aged between 8 and 24 mo, underwent the trial. Its average duration (PT + T) was 3 min and 16 s (SD = 26 s). According to experts’ assessments (Gold Standard), 16.5% of tested HD reached an AP score, and the Idele scoring system correctly identified 93.3% of them (sensitivity). Specificity and accuracy values were of 96.1% and 95.7%, respectively (P value < 0.0004). Recursive feature elimination identified 25 of the 118 features (categories of items) from the scoring system as significant predictors of AP/NAP. An AP HD was statistically defined as a dog who completed the PT and T phases, showed keenness, correct position in relation to the handler, and absence of prey drive. Four environmental effects significantly influenced AP/NAP: the field trial session, the owner’s experience with HDs, the conditions of the HD’s first contact with livestock, and the type of livestock with which the HD is accustomed to working (P-values <0.0005, <0.05, <0.05, and <0.007, respectively). Inter-evaluator agreement was substantial (0.70). The field trial proved to be a short, easily implemented, standardized, reproducible method for detecting AP/NAP. Hence, the field trial and its scoring system could provide a basis for a breeding program based on phenotype pending additional testing of HDs and genetic analyses.

Keywords: adequate phenotype, herding traits, field trial, herding dog, scoring system, breeding selection

A new herding dog field trial, with a specific scoring system, was set up and implemented in France with 460 young Border Collies. This trial was found to correctly detect an adequate phenotype for French farmers’ needs, and is short, easy to set up, standardized, and reproducible.

Introduction

Herding dogs in France

In the context of the waning appeal of the French agricultural sector, it is becoming imperative to propose solutions that will enhance the well-being and efficiency of farmers in their day-to-day work. Amongst these solutions, the herding dog (HD) emerges as a particular asset to livestock farming. HDs provide farmers with efficiency (increasing their autonomy and saving them time), safety (making animals more docile), and comfort (reducing physical effort and the stress that can result from animal handling) (Ducreux, n.d.). Whilst determining the exact financial contribution of HDs to stock farming in France is not easy, in Australia it has been estimated that the financial contribution of typical working dogs to farmers would indicate at least a five-fold return on investment (Arnott et al., 2014).

In France, HDs work with a wide range of livestock, including cows, sheep, goats, poultry, and other animals (Ducreux, 2015). No official census exists, making it impossible to determine the exact number of HDs working on French farms. However, the Border Collie (BC) is widely recognized as the predominant breed within this field (Conessa, 2013).

Given the crucial role that HDs play in the daily tasks on farms, selectively breeding them for appropriate herding traits is important for farmers. Currently, these HDs are primarily chosen based on the performance records of their ancestors in competitive sheepdog trials (Conessa, 2013). In most competitive trials, the handler (who does not move initially from the starting point) directs the HD to navigate a predetermined course with a small flock of sheep.

Performance in competitive trials relies mainly on two factors. Firstly, it requires a dog with high learning ability, which is generally the case for HD breeds (Courreau, 2012), particularly the BC breed, which is known for being one of the most trainable breeds (Coren, 1994). Secondly, it depends on the skill of the handler in training the dog to achieve optimal performance. Many French livestock farmers may lack these training skills or have limited time, compared to non-professional trial participants, for HD training. Given these factors, we believe that French farmers would benefit from HD with a stronger natural interest (Courreau, 2012) and spontaneity in herding behaviors. This approach would streamline the training process and yield HD capable of performing swiftly and effectively in authentic farm conditions. Courreau (2012) defines this interest as a particular natural interest in livestock, which is likely to be an inherited trait. This natural interest is composed of three main characteristics: 1) wanting to gather (group together) other animals, 2) wanting to keep them together, and 3) wanting to move them. These last two characteristics are often used in HD behavior studies; Arvelius et al. (2013) who, respectively, define them as natural ability (the dog’s ability to foresee and counteract the livestock’s movements and thereby keep the flock together) and power (the dog’s ability to move livestock). This study also includes an observation related to the animals’ gathering phase known as the out-run, defined as how wide circle the dog takes when moving from the handler to the balance point (point where the dog comes in contact with flock so as to bring the flock to the handler) assessed at a position mid-between handler and balance point.

Herding dogs’ ability assessment methods

In France, HDs are selectively bred for phenotype, but no genetic parameters or breeding values have been estimated for HD herding behavior traits. Selection based on estimated breeding values would be more effective for behavior traits (Liinamo et al., 1997; Ruefenacht et al., 2002; Van der Waaij et al., 2008). In both breeding methods, a crucial initial prerequisite would involve the implementation of a reliable, standardized, and reproducible field trial with a scoring system to assess young HD, at an age before training has too significant an impact on their spontaneous behaviors.

The measurements should be reliable enough to give information on genetic differences between HDs. The higher the heritability, the greater genetic background that distinguishes the differences between HDs. Standardization and reproducibility of measurements applied to animals intended for breeding are essential to reduce the risk of bias or random effects distorting the measurements. They would facilitate phenotyping HDs on a large scale, which is crucial to estimating accurate genetic parameters. The scoring system used should also be easy to implement, so as to collect as much data as possible. Until enough data has been recorded, the variance components needed for the prediction of breeding values cannot be reliably estimated. The current French competitive trials provide the most abundant data source for conducting quantitative genetic studies, but the field test used, and the assessment method employed, are not fully adapted for HD selective breeding.

A start in HD breeding programs could be to design a test tailored to HDs with a specific scoring system, considering that certain behavioral tests or traits have already demonstrated their ability to accurately predict the future success of young working dogs in specific domains, as presented below.

Regarding existing HD-specific behavioral tests, Horn et al. (2017) showed that Norwegian HD trials provided low informational data, including only low heritability traits (0.010 ≤ h² ≤ 0.056) with low repeatability (0.041 ≤ r ≤ 0.286). When considering HD selective breeding, these authors recommend test standardization, and studying traits derived from the predatory motor pattern, a sequence of instinctual behaviors expressed by wolves when hunting which are bred for in working dogs. This last recommendation refers to the book by Coppinger and Coppinger (2001), in which this sequence is described as orient, eye-stalk, chase, grab-bite, kill-bite, dissect, and consume. Historically, HD selection could have consisted of attempting to intensify eye-stalk and chase, and diminishing the grab- and kill-bite according to Horn et al. (2017), when referring to Lord et al.'s (2016) work. Finally, Horn et al. (2017) recommend to study traits included in the Herding Trait Characterization (HTC), which is another test used in Sweden to assess herding behaviors, studied by Arvelius et al. (2013). HTC traits’ heritability estimates (0.14 ≤ h² ≤ 0.50) were considerably higher compared to those observed in Norwegian trials; it is worth noting that these traits have not been assessed by using standardized test situations, and their reproducibility has not been assessed.

However, none of these tests and guidelines were specifically designed to identify HDs with a phenotype that matches the expectations of French users. The desired phenotype needs to correspond to the HDs’ daily work on a farm, which would involve working with flocks comprising dozens of individuals, and which must consider the flock’s behaviors which initiate the reaction of the HD. This last aspect sets it apart from existing HTC (Swedish trials) or Norwegian trials, that did not take flock’s behavior into account. In addition, these also test HDs up to 4 yr old, this means training can have a significant impact on their instinctual herding abilities. The definition of the desired phenotype has been established by two HD expert trainers, who are also livestock farmers, with several decades of experience in the HD field and certified by the Institut de l’Elevage (Idele, French Livestock Institute) which aims at providing livestock farmers (of cattle, sheep, goats, or horses) with technical solutions to improve all aspects of their work. They have termed it the “Adequate Phenotype” (AP). An AP HD was defined as being able: to control the direction of a flock, to keep it grouped close to a handler when needed, to confront animals it is dealing with in a respectful manner, and able to create movement of the flock without excessive disturbance, threatening or attacking it through chasing, or uncontrolled biting (Ducreux, n.d.).

Since this phenotype relies on flock instinct as defined by Courreau (2012) and includes some traits whose heritabilities are estimated moderate-to-high (Arvelius et al., 2013), it should be possible to create a breeding program for it.

Aims

To address the methodological issues of the existing HD behavioral tests, the first aim of this study was to set up a new test to detect AP, using a scoring system defining the empirical assessment of the experts who defined AP/non-adequate phenotype (NAP). The second aim was to obtain a precise definition of AP/NAP after testing several hundred HDs. The third aim was to investigate the potential effects of various environmental effects on AP/NAP.

Materials and Methods

The experimental protocol received ethical validation from the Comité d’Ethique en Recherche Clinique (ComERC) of the Ecole nationale vétérinaire d’Alfort (EnvA) (file No. 2020-09-25).

Field trials

A field trial was specially set up and implemented across France for this study. This trial was specific to a French project initiated by the Idele and its network of HD trainers, together with six partners: EnvA (« Ecole nationale vétérinaire d’Alfort », National veterinary school of Alfort), FUCT (« Fédération des Utilisateurs de Chiens de Troupeaux », Federation of Herding dog Users), AFBC (« Association Française du Border Collie », French Border Collie Association), SCC (« Société Centrale Canine », French Kennel Club), INRAE (« Institut national de recherche pour l’agriculture, l’alimentation et l’environnement », National Research Institute for Agriculture, Food and the Environment) and the Biological Resource Center (CRB) Cani-DNA.

About 18 field trial sessions were held between 2019 and 2023 in 15 different French locations, with a mean of 29 ± 6 HDs tested per session. The sessions were organized by the Idele in collaboration with the local HD users’ associations, which are affiliated with the FUCT. These associations played a pivotal role in selecting farms to host the sessions and recruiting HDs from their members and contacts.

Two criteria were considered when the field trial protocol was being designed: ease in setting up and representativity of the tasks that an HD should have to perform in response to flock actions (controlling flight, confrontation, gathering or regrouping stock), under real conditions on a French farm.

To achieve this, the trial was conducted using a flock of 10 to 15 ewes that had no lambs and which were accustomed to being herded by an HD. The flock was intentionally not overly familiar with humans. For each session, two to three groups from each host farm were used to avoid the flock becoming accustomed to the field trial and anticipating it. As a result, the flocks varied between sessions and were changed once or twice within a single session. The trial took place in an open area, spanning approximately 0.5 to 1 ha, without any prominent points of attraction for the flock such as trees or the entrance to a sheep shed. HDs were held on a leash at the beginning of the trial until they noticed the flock in the test area (plot), but after this point they were allowed to move freely within the test area for the rest of the trial, unrestricted by leash or fences in their interactions with ewes.

The field trial necessitated a minimum of four persons: a secretary responsible for welcoming participants, a cameraman positioned at an elevated spot to film the trials, an HD user accompanied by their own dog (not tested) to return the flock to its initial position between each HD test, and a “Test Leader” (TL) who was unfamiliar both with the HDs and to HD training. The secretary ensured that an HD undergoing its trial could not see or hear its owner, thus minimizing distractions.

Herding dogs tested

To participate in the field trial, dogs had to be of an HD breed, properly identified, and in good health. To mitigate the potential influence of training on the behaviors exhibited during the trial and considering that AP is intended to be detected at an early stage for selection purposes, only young HDs between 8 and 24 mo of age were included in the testing.

A total of 536 HDs were tested during the project. However, 55 of them were excluded from the analyses for various reasons. Some of these HDs participated in the first field trial session (including 29 HDs) that was conducted to test the trial protocol itself. Other HDs were excluded due to not meeting the age criteria, which lacked previous unrestricted contact with livestock, or because technical issues were encountered during the trial, such as an unworkable protocol or malfunctioning camera.

During trials 21 non-BC dogs, either crossbred or purebred from seven other breeds (Beauceron, Pyrenean Sheepdog, Australian Kelpie, Australian Shepherd, Berger de Savoie, Australian Cattle Dog, Bouvier des Ardennes) were tested. These 21 non-BC dogs were excluded from the analyses because of the large imbalance between BC and non-BC numbers.

The analyzed sample consisted of 460 BCs, comprising 52% females and 48% males. Their age distribution is given in Table 1.

Table 1.

Number of HDs per aging class tested in the field trial

Age (mo)	Number
Unknown	5
[8-11]	115
[11-14]	114
[14-19]	109
[19-24]	117
Total	460 HDs

Open in a new tab

Field trial protocol

Each HD tested on the field trial was recorded on video, assessed as being AP or NAP by the experts and scored using the Idele scoring system (Tables 2 to 5), created for this study, and described in the next section.

Table 2.

Idele scoring system used for statistical analyses containing number, abbreviation, description, and possible categories for each item scored during the BT and PT phases of the field trial

Phase studied	Item number	Item abbreviation	Description of the item	Possible categories
Before testing (BT)	1	BT1	HD¹ reaction to TL² approaching and picking up the leash	□ Welcomes the TL □ Stays close to owner □ Withdraws/tries to escape □ Signs of aggression □ Not filled
	2	BT2	HD behavior on leash with TL	□ Follows TL □ Pulls back, resists □ Not filled
Pre-test (PT)	3	PT1	HD first movement when unleashed	□ Not assessable □ Tries/leaves the test area □ Does not go to the flock (does something else) □ Does not go to the flock (no movement) □ 0. Does not reach axis (cf. figure on the left) □ 1. Reaches axis □ 2. Exceeds axis □ 3. Circles around the flock
	4	PT2	Does HD go spontaneously to the flock?	□ Spontaneously goes to the flock □ Flock stimulation necessary □ Not concerned (never goes to the flock)
	5	PT_STOP	If PT is stopped, why?	□ HD tries/leaves the test area □ HD loses interest □ At least 1 chase □ Repeated predatory bites (> 3) □ Disturbs flock □ Unworkable test □ PT complete
	6	PHASE_1	Is PT phase complete?	□ Yes □ No

Open in a new tab

¹Herding dog.

²Test leader.

No gestures or oral commands were given to the HD during the field trial. The assessment process began with a “Before Testing” (BT) phase, where the HD was on leash and walked to the test area by the TL. The TL and the HD positioned themselves at a starting point (marked by a blue circle in Figure 1), which was 35m away from an endpoint (marked by a red circle in Figure 1) where the flock was put in position. These points were indicated by posts in the test area. Figure 1 illustrates the sequential steps of the “Pre-Test” (PT) phase, where items 3 to 6 of the Idele scoring system were scored (Table 2). During the PT phase, the TL approached the flock and unleashed the HD when the HD noticed the flock (step A). The TL then started a timer and kept walking toward the endpoint (step B) and encouraged the flock to move until 1 min had passed (step C). If the HD did not show any interest in the flock during this time, the TL continued to stimulate the movement of the flock for an additional minute to hopefully get the dog interested. At the end of the PT phase, which aimed to familiarize the HD with the new environment (including the flock, the test area, and the TL), the TL put the HD back on the leash and returned (step D) to the starting point (blue circle). If the HD did not display any interest in the flock, left the test area, or exhibited behavior that posed a threat to the well-being of the flock, the assessment ended at this stage (“PHASE_1: NO” in Table 2, explained in item 6). However, if the HD successfully completed the PT phase, indicating a positive response (“PHASE_1: YES”, Table 2), it was deemed eligible to proceed to the next phase, known as the “Test” (T).

Figure 1. — Simplified illustration of the PT phase from field trial protocol. The TL and the HD positioned themselves at a starting point (blue circle), which was 35m away from an endpoint (red circle). The TL approached the flock and unleashed the HD when the HD noticed the flock (step A). The TL then started a timer and kept walking toward the endpoint (step B) and encouraged the flock to move until 1 min had passed (step C). At the end of the PT phase, the TL put the HD back on the leash and returned (step D) to the starting point (blue circle).

The flock was placed back at the endpoint (red circle) between the PT and T phases.

Figure 2 illustrates the sequential steps of the T phase, where items 7 to 28 were scored (Tables 3 to 5). In the T phase, the TL unleashed the HD from the starting point and started a timer (step E). Then, the TL walked to (step F) the endpoint (red circle) and stood still for 15 s (step G) at this point. Items 7 to 10 were scored from the moment the HD was unleashed to the end of these 15 s. Following that, the TL performed the following steps:

Table 3.

Idele scoring system used for statistical analyses containing number, abbreviation, description, and possible categories for each item scored during the first 15 s and the first rotation phase of the test from the field trial

Phase studied	Item number	Item abbreviation	Description of the item	Possible categories
First 15 s of the test (T¹)	7	START_T	HD² first movement when unleashed	□ No test □ Not assessable □ Tries/Leaves the test area □ Does not go to the flock (does something else) □ Does not go to the flock (no movement) □ 0. Does not reach axis (cf. figure on the left) □ 1. Reaches axis □ 2. Exceeds axis □ 3. Circles around the flock
	8	MOV_T	Movement of flock when HD starts circumvention	□ No test □ Not filled □ No movement □ At least 1 sheep moving
	9	GRP_T	Flock reaction to HD going around	□ No test □ Not filled □ Flock grouped □ Flock split + HD regroups □ Flock split
	10	SEC15_T	Flock distance to TL³ after 15sec motionless	□ No test □ Not filled □ Close to TL □ Far from TL
First rotation (T)	11	POS_S1	HD position during first rotation	□ No test □ Not filled □ Position in relation to TL □ Stays still or does something else □ Bites, Hits or Chases a sheep □ Interacts (with a sheep) □ Circles around the flock
	12	AXIS_S1	HD position when TL stops first rotation	□ No test □ Not filled □ 0. Does not reach axis □ 1. Reaches axis □ 2. Exceeds axis □ Not assessable
	13	GRP_S1	Flock reaction to first rotation	□ No test □ Not filled □ Grouped □ Split

Open in a new tab

¹Test.

²Herding dog.

³Test leader.

Walked around the flock clockwise for 360 °C: “first rotation” (step H). Items scored 11 to 13 (Table 3).
Stood still for 15 s (step G).
Walked around the flock anti-clockwise for 360 °C: “second rotation” (step I). Items scored 14 to 16 (Table 4).
Stood still for 15 s (step G).
Walked 30 steps away from the flock to the center of the test area, with the flock behind the TL: “power” (step J). Items scored 17 to 21 (Table 4).

Table 4.

Idele scoring system used for statistical analyses containing number, abbreviation, description, and possible categories for each item scored during second rotation phase and power test phase of the field trial

Phase studied	Item number	Item abbreviation	Description of the item	Possible categories
Second rotation (T¹)	14	POS_S2	HD² position during second rotation	□ No test □ Not filled □ Position in relation to TL³ □ Stays still or does something else □ Bites, Hits or Chases a sheep □ Interacts (with a sheep) □ Circles around the flock
	15	AXIS_S2	HD position when TL stops second rotation	□ No test □ Not filled □ 0. Does not reach axis □ 1. Reaches axis □ 2. Exceeds axis □ Not assessable
	16	GRP_S2	Flock reaction to second rotation	□ No test □ Not filled □ Grouped □ Split
Power (T)	17	START_W	HD behavior when TL starts walking away	□ No test □ Not filled □ Goes to flock □ Motionless □ Far from the flock □ Circles around the flock □ Loses interest
	18	MOV_W	HD movements throughout power phase 3 categories can be chosen	□ No test □ Not filled □ Flanks □ Goes straight □ Circles around
	19	FLW_W	Does flock follow TL throughout power phase?	□ No test □ Not filled □ Follows □ Does not follow □ Passes TL □ Not assessable
	20	GRP_W	Does flock stay in a group throughout power phase?	□ No test □ Not filled □ Grouped □ Split □ Not assessable
	21	DST_W	Flock distance to TL throughout power phase	□ No test □ Not filled □ Close to TL □ Far from TL □ Not assessable

Open in a new tab

¹Test.

²Herding dog.

³Test leader.

Items 22 to 28 (Table 5) covered the whole T phase. After the “power” phase, the T ended, and the TL put the HD back on leash and brought it back to its owner.

Table 5.

Idele scoring system used for statistical analyses containing number, abbreviation, description, and possible categories for each item scored during the entire test phase of the field trial

Phase studied	Item number	Item abbreviation	Description of the item	Possible categories
Entire test (T¹)	22	BITE	Does HD² bite a sheep without a previous interaction?	□ YES □ NO
	23	NB_BITES	If so, how many times?	□ 0 □ 1 □ 2 □ 3 □ 4
	24	FOCUS	Does HD stay focused during test?	□ No test □ Not filled □ Yes □ No
	25	MOTIV	Is HD motivated?	□ No test □ Not filled □ Yes □ No
	26	INTER	Interactions between HD and flock	□ No test □ Not filled □ No confrontation □ HD wins confrontation □ HD loses confrontation
	27	T_STOP	If T is stopped, why? 2 categories can be chosen	□ No test □ Not filled □ T complete □ HD tries/Leaves the test area □ HD loses interest □ At least 1 chase □ Repeated predatory bites (> 3) □ Disturbs flock □ Unworkable test
	28	PHASE_2	Is the T phase complete?	□ Yes □ No

Open in a new tab

¹Test.

²Herding dog.

Idele scoring system creation and use

The Idele scoring system (supplementary material) was inspired by written references (Arvelius et al. (2013) and Horn et al. (2017) in particular) and national experts’ recommendations in preliminary work groups, to assess AP/NAP. These preliminary working groups studied assessments of HDs in available literature, to try to objectify and put words on the subjective criteria used by the experts when assessing an HD, in order to create the Idele scoring system. Several tests were conducted to make the resulting scoring system as handy and simple as possible. This scoring system, in its final version, incorporated eight traits (Sociability, Balance, Natural ability, Focus, Power, Oscillating movements, Out-run, Bite) from HTC version 1, with heritability estimates ranging from 0.26 to 0.48 (Arvelius et al., 2013).

All 460 HDs were scored using the Idele scoring system outlined in the supplementary material. The scores were undertaken by a single individual (BD), who was not an expert in HD behavior. The scores were based on watching videos of the field trials.

The final assessment of AP/NAP was made by the experts (PC and TLM). They reached a consensual agreement on-site during the field trial sessions or, if unable to attend, by watching the videos. Their conclusions were considered the Gold Standard for the study.

Tables 2 to 5 show the Idele scoring system organized in the format used for statistical analysis. It consisted of 28 items (variables), of which 27 were qualitative and one was quantitative (e.g., number of bites during the entire test). Among all 28 items, 2 referred to the BT phase (items 1 to 2: BT1, BT2, in Table 2), 4 related to the PT phase (items 3 to 6: PT1, PT2, PT_STOP, PHASE_1, Table 2), and the remaining 22 related to the T phase. Each of the items scored was multi-choice, and they had between two and eight possible categories, including “No test” and “Not filled” categories for items that were not scored due to interruption of the trial (“No test”) or no action of the HD (“Not filled”). For example, the “flock reaction to HD going around” (item 9: GRP_T, Table 3) was not scored if the HD did not actually go around the flock at item 7 (START_T, Table 3).

The 28 items comprising the Idele scoring system, presented in Tables 2 to 5, referred to one or several herding behaviors, listed below, which the experts considered were implicated at different levels in AP/NAP. The attribution of items according to these behaviors is as follows:

HD’s keenness when interacting with animals (PT_STOP, T_STOP, T_START, POS_S1, POS_S2, START_W, MOTIV).
HD’s focus on the target animals (FOCUS).
HD’s interaction with the handler in the presence of the test flock (PT_STOP, T_STOP, T_START, SEC15_T, POS_S1, AXIS_S1, POS_S2, AXIS_S2, START_W).
HD’s predatory behavior toward the test flock (PT_STOP, T_STOP, POS_S1, POS_S2, BITE, NB_BITES).
HD’s abilities to contain the test flock for the handler (START_T, AXIS_S1, AXIS_S2).
HD’s ability to move the test flock for the handler (START_W, FLW_W, GRP_W, DST_W).

Supplementary data

Pedigree data of HDs was received from the SCC and additional information was gathered through a questionnaire completed by the owners of the HDs. This questionnaire provided valuable insight into several environmental effects and their potential associations with AP/NAP. These effects were age at the time of the test, sex, age when acquired by the owner, owner’s experience with HDs, how the HD was acquired (bought, gift, born on the farm), criteria for choosing puppy (French stud-book, sex, working parents or not, advice from someone, appearance, sociability, other), owner’s participation in training courses (with this HD or another one), housing, frequency of training or working sessions, conditions of first contact with livestock, and the livestock on which the HD was accustomed to working.

Statistical analyses

All statistical analyses were performed with RStudio software.

The first one was the analysis of the performance of the Idele scoring system to predict AP/NAP. The Machine Learning Algorithm function randomForest (Breiman, 2001; Liaw and Wiener, 2001) was used to identify the items that accurately differentiate AP from NAP HDs and to create an AP/NAP prediction model. Each category of an item was considered as a feature. This machine-learning technique accounts for nonlinear relationships and dependencies among all features. The Random Forest (RF) algorithm was executed with the following parameters: ntree = 500; sqrt(p), where p is the number of input (features). Each feature was given an importance score (MDA: Mean Decrease Accuracy) based on the increase in error caused by removing that feature from the predictors. RF uses about 80% of the samples (368 HDs) in the dataset as a training set by randomly sampling with replacement and validates the selected features using the remaining “out-of-bag” samples. This split ratio was chosen because it is common in statistics and produces good results according to empirical studies (Gholamy et al., 2018). The Idele scoring system’s sensitivity (Se), specificity (Sp), and accuracy (ACC) for predicting AP/NAP were calculated, when compared to experts’ assessment.

Then, AP/NAP characterization was done in three steps. First, finding the most important features from the previous RF model thanks to recursive feature elimination (RFE). RFE is a sequential method of model construction that reduces the number of features; these are removed in order of importance until an optimal model is achieved based on the best cross-validation performance with the smallest number of features (Kuhn and Johnson, 2013; Bulut, 2021). RFE was used to select the smallest number of features to provide the best-performance accuracy in terms of assessing HD as AP or NAP. The method consists of running five iterative RF models with cross-validation, based on five different training set splits. For each model, at each iteration, the feature with the lowest importance is removed until only one feature remains at the end. Each iteration is associated with a performance model, enabling the algorithm to select the model with the best performance and return the features associated with it. This process is repeated 10 times for each RF models, to improve the reliability of the result; to guarantee the reproducibility of the results, the randomness was set. The best-performing model was identified based on mean prediction accuracy and Cohen’s kappa value. To execute RFE, the data had to be rearranged into a matrix consisting of binary coding. All items of the scoring system had first been transformed into factorial variables. This means that an item with X categories is transformed into X binary “Yes”/“No” feature, depending on whether the level is present or not. With this transformation, 118 features were obtained. Secondly, the impact of each feature on the probability of assessing AP was observed by examining partial dependencies, that showed the incremental impact that one or two features exert on the predicted outcome of a machine-learning model (Friedman, 2001). Thirdly, a new RF prediction model was constructed using the features identified by RFE to verify the adequacy of these identified features in predicting AP/NAP.

The environmental effects tested on AP/NAP were: sex; age at the time of the test ([8–11], [11-14], [14-19], [19-24] mo); field trial session; how the HD was acquired (bought, gift, born on the farm); criteria for choosing puppy (French stud-book, sex, working parents or not, advice from someone else, appearance, sociability, other); owner’s experience with HD (yes or not); owner’s participation in training courses (with this HD or another one), housing (kennel or other), frequency of training or working sessions (< once a week, once a week, several times a week, every day); conditions of first contact with livestock (loose, on leash, enclosed space, other), and the livestock on which the HD was accustomed to working (sheep, cattle, mixed species). The environmental effects on AP/NAP were estimated one by one thanks to the glm (Fitting Generalized Linear Models) function. This function uses generalized linear regression principles, fitting the model for non-normal distributions, in this case using binomial error distribution (Pedamkar, 2019).

Subsequently, the reproducibility of the Idele scoring system was calculated through the agreement between the three evaluators. In addition to DB, whose items’ scores were used for the previous statistical analyses, two persons involved in the project, with no prior experience with HDs and who were not involved in the creation of the Idele scoring system, also independently scored the same HDs by observing the videos of their tests. They received a brief explanation of the Idele scoring system via a 30-min presentation and viewed an example of a test video. The agreement between the three evaluators was calculated with Fleiss’ kappas for all items of the Idele scoring system, excluding the first two items (BT phase: BT1 and BT2), which were not captured on video. To calculate Fleiss’ kappas, 101 HDs were scored by these three evaluators. The Fleiss kappa is a measure of the agreement between evaluators that extends the Cohen kappa for two or more evaluators when the evaluation method is measured on a categorical scale (Aprilliant, 2021). Fleiss’ kappa takes a value between −1 and 1 and can be interpreted as follows: the higher the absolute value of kappa, the more significant the correlation. A negative kappa value indicates a negative correlation, whilst a positive value indicates a positive correlation (Fleiss, 1971).

Finally, a Multiple Correspondence Analysis (MCA) on all 28 items was realized to identify similarities between HDs and associations between the items (Josse, 2010; Husson and Josse, 2014). Hierarchical Clustering on Principal Components (HCPC) (Husson et al., 2023) was employed to investigate whether more than two “profiles” (AP/NAP) would emerge by applying it to the principal components (representing the first 10 dimensions) obtained through MCA. These 10 dimensions were selected because they captured a significant portion of the data set’s variability and because this was the maximum number which could be interpreted by the experts. The objective of this analysis was to group HDs based on the shared items’ characteristics. The clustering was performed using the HCPC function (FactoMineR package). The optimal number of clusters was determined with the Gap statistics Method (Tibshirani et al., 2001). It compares the within-cluster dispersion of the data to a null reference distribution, which could be expected from a random uniform distribution. Clusters’ interpretation was based on v-test values. The v-test reflects the differences between the frequency of a category of an item within the cluster and the overall population (N = 460 HDs). A high v-test value indicates a higher frequency within a cluster compared to the general population. The category of an item could be over- or under-represented (positive or negative v-test values, respectively) (Husson et al., 2017). A v-test value exceeding 1.96 corresponds to a P value smaller than 0.05.

A new RF was performed to create a model for predicting clusters (generated by HCPC) using the Idele scoring system. The Idele scoring system’s Se, Sp, and ACC for predicting clusters, when compared to HCPC clustering, were calculated. The training set used for this estimate was the same as the one employed for predicting AP/NAP.

Results

AP/NAP prediction

98 HDs (21.3% of the sample) did not pass the PT phase, and 134 HDs (29.1% of the sample) did not pass the T phase. 228 HDs (49.6% of the sample) passed both PT and T phases. Amongst them, a mean time of 78 ± 22 s was necessary to complete the PT phase, and 119 ± 12 s to complete the T phase, for a mean time of 3 min 16 s ± 26 s to complete the entire field trial.

76 HDs were assessed as AP by the two experts and represented 16.5% of the sample.

Out of the 92 HDs included in the test dataset, 88 of them had their AP/NAP score correctly predicted by the RF prediction model (Table 6). The model’s performance, including Se, Sp, and ACC, is presented in Table 6.

Table 6.

AP/NAP prediction performances, Idele scoring system’s Sp, Se, and ACC to predict AP/NAP (92 HDs) compared to expert’s assessment, from RF model

	NAP	AP
Predicted NAP	74	3
Predicted AP	1	14
Specificity (Sp)	96.1%
Sensibility (Se)	93.3%
Accuracy (ACC)	95.7% (P value < 0.0004)

Open in a new tab

AP characterization

RFE was run on a matrix of all 460 HDs and all 118 features in the Idele scoring system. 25 features were retained using this function in relation to the T phase, with a 92.9 ± 3.7 % accuracy and a 0.76 ± 0.12 kappa. The 12 features which increase the probability of being assessed as AP are presented in Table 7, and the 13 features decreasing probability are presented in Table 8. The top three features which increase AP probability were: “START_T=2. Exceeds axis” (+ 9.5%), “FOCUS=YES” (+ 7.5%), and “SEC15_T=Close to TL” (+ 7.3%). Inversely, the top three features decreasing AP probability were: “MOV_T=Not filled” (-8.3%), “BITE=YES” (-5%) and “MOV_W=Not filled” or “GRP_W=Split” (-4.5%).

Table 7.

12 features selected by the RFE increasing the HD’s probability to score AP

Features (Item = category (item no.))	AP probability if feature not expressed (%)	AP probability if feature expressed (%)	AP probability variation when feature expressed (%)
START_T = 2. Exceeds axis (7)	9.2	18.7	+9.5
FOCUS = YES (24)	5.5	13.1	+7.5
SEC15_T = Close to TL (10)	14.8	22.1	+7.3
INTER = No confrontation (26)	19.3	24.8	+5.5
DST_W = Close to TL (21)	8.9	14.0	+5.1
MOTIV = YES (25)	7.6	12.4	+4.7
GRP_T = Flock grouped (9)	9.6	13.4	+3.8
PHASE_2 = YES (28)	22.7	25.9	+3.2
T_STOP = T complete (27)	17.3	19.9	+2.6
MOV_T = No movement (8)	24.5	26.3	+1.8
POS_S2 = Position in relation to TL (14)	17.8	19.5	+1.7
POS_S1 = Position in relation to TL (11)	20.7	22.2	+1.5

Open in a new tab

Table 8.

13 features selected by the RFE decreasing the HD’s probability to score AP

Features (Item = category (item no.))	AP probability if feature not expressed (%)	AP probability if feature expressed (%)	AP probability variation when feature expressed (%)
MOV_T = Not filled (8)	19.0	10.7	−8.3
BITE = YES (22)	21.8	16.3	−5.0
MOV_W = Not filled (18)	20.7	16.2	−4.5
GRP_W = Split (20)	17.4	12.9	−4.5
POS_S2 = Bites, Hits or Chases a sheep (14)	19.0	14.5	−4.4
GRP_S1 = Split (13)	13.9	10.6	−3.3
FOCUS = NO (24)	11.6	8.8	−2.8
AXIS_S2 = Not filled (15)	18.5	16.1	−2.5
FLW_W = Does not follow (19)	15.6	13.3	−2.3
GRP_T = Flock split (9)	15.3	13.5	−1.7
POS_S1 = Still or doing something else (11)	13.7	12.0	−1.7
AXIS_S1 = Not filled (12)	22.7	21.6	−1.1
POS_S2 = Still or doing something else (14)	12.6	11.5	−1.1

Open in a new tab

A new RF prediction model was developed using these 25 features. The performance of this new model, as shown in Table 9, was lower compared to the initial model which used 118 features. The new model showed a decrease in sensitivity of 6.6 percentage points, specificity of 1.3 percentage points, and accuracy of 2.2 percentage points.

Table 9.

Sp, Se, and ACC to predict AP/NAP (92 HDs) compared to expert’s assessment from RF model using the 25 features identified by RFE

Specificity (Sp)	94.8%
Sensibility (Se)	86.7%
Accuracy (ACC)	93.5% (P value < 0.005)

Open in a new tab

Environmental effects and their association with AP

Four environmental effects showed statistically significant associations with AP/NAP: the field trial session (P value < 0.0005), the owner’s experience with HD (P value < 0.05), the conditions of the dog’s initial contact with livestock (P value < 0.05), and the specific livestock with which the HD was accustomed to working (P value < 0.007). These results indicate that HDs were more likely to express AP during our trial if their owners had previous experience with HDs, their initial contact with livestock was unrestricted (without a leash or fence), or if they were accustomed to working with sheep (the species used during the trial) rather than other species.

No environmental effects associated with AP/NAP were found for sex, age, how acquired, puppy choice criteria, owner’s participation in training courses, housing, training, or working frequency.

Inter-evaluator agreement

Fleiss kappa of items 3 to 28 are listed in Table 10. All kappas were statistically significant (z-scores > 1.96 and P values < 0.05). This demonstrates that for all items, the correlation was significantly different from what could have been expected according to chance. Kappas for items ranged from 0.55 (NB_BITES) to 0.83 (PHASE_1), and the mean Fleiss kappa for all 26 items reached 0.70. Items 17 (START_W) and 23 (NB_BITES) are the only ones with a “moderate” agreement (0.41 ≤ κ ≤ 0.60), when all other items, as well as the general agreement between the three evaluators, reached a “substantial” agreement (0.60 < κ ≤ 0.80), according to Landis and Koch (1977). PHASE_1 item even reached an “almost perfect agreement” (> 0.80) at 0.83.

Table 10.

Fleiss kappa for items of PT and Test (T) phases of Idele scoring system

Item (no.)	Kappa	z score	P value
PT1 (3)	0.67	24.6	<0.0001
PT2 (4)	0.74	17.4	<0.0001
PT_STOP (5)	0.74	20.2	<0.0001
PHASE_1 (6)	0.83	14.4	<0.0001
START_T (7)	0.69	24.1	<0.0001
MOV_T (8)	0.63	18.6	<0.0001
GRP_T (9)	0.61	17.4	<0.0001
SEC15_T (10)	0.69	19.2	<0.0001
POS_S1 (11)	0.69	23.0	<0.0001
AXIS_S1 (12)	0.66	21.7	<0.0001
GRP_S1 (13)	0.73	18.1	<0.0001
POS_S2 (14)	0.68	21.6	<0.0001
AXIS_S2 (15)	0.64	20.6	<0.0001
GRP_S2 (16)	0.74	18.3	<0.0001
START_W (17)	0.59	22.6	<0.0001
MOV_W (18)	0.73	21.1	<0.0001
FLW_W (19)	0.66	22.5	<0.0001
GRP_W (20)	0.77	20.4	<0.0001
DST_W (21)	0.72	21.5	<0.0001
BITE (22)	0.72	12.6	<0.0001
NB_BITES (23)	0.55	10.6	<0.0001
FOCUS (24)	0.76	21	<0.0001
MOTIV (25)	0.75	19.7	<0.0001
INTER (26)	0.72	21.9	<0.0001
T_STOP (27)	0.71	23.8	<0.0001
PHASE_2 (28)	0.76	13.3	<0.0001

Open in a new tab

HD clusters

The first 10 dimensions of the MCA, conducted on 460 HDs and the 28 items (as presented in Tables 2 to 5), accounting for 47.9% of the variance in the data, were retained for subsequent analyses. Amongst these ten dimensions, two played a significant role. The first dimension explained 18.6% of the variances and was characterized by “No test” category for all items in the T phase. The second dimension explained 12.2% of the variances and was associated with items that were “Not filled” in the T phase. The other dimensions explained less than 3.5% of the variances.

The Gap statistics method, applied to the results of the HCPC, was used to determine the optimal number of clusters in our data, and suggested the possibility of 10 clusters. Once the clusters were generated, they were interpreted based on the items that define the most the cluster. Each cluster is characterized by a category of these items. To ensure correct interpretation, only categories of items with a minimum v-test of 4 were retained. Only categories of items that showed significant differences from the overall population (P value < 0.05) were considered for cluster interpretation.

Tables 11 to 13 show the 10 clusters. For each cluster, they described that the name attributed to it, the number of HDs in it, and the categories of the items retained to describe it, with their associated v-test.

Table 11.

Description of clusters 1 to 4 from field trial performances generated by HCPC: Names, HD numbers, and categories of items (item no.) which characterize the cluster with their associated v-tests

Cluster 1		Cluster 2		Cluster 3		Cluster 4
Complete trial: natural ability and power	v-test	Complete trial: circles around the flock	v-test	Complete trial: lack of natural ability	v-test	Complete trial: stubborn flock	v-test
N = 88	v-test	N = 59	v-test	N = 54	v-test	N = 5	v-test
POS_S2 = Position in relation to TL (14)	13.3	T_STOP = T complete (27)	9.3	MOV_T = Not filled (8)	10.1	POS_S2 = Interacts (14)	6.6
MOTIV = YES (25)	12.5	PHASE_2 = YES (28)	9.3	POS_S1 = Position in relation to TL (11)	9.9	POS_S1 = Interacts (11)	6.3
GRP_W = Grouped (20)	12.5	FLW_W = Does not follow (19)	8.9	MOTIV = YES (25)	9.3
START_W = Goes to flock (17)	12.3	MOTIV = YES (25)	8.6	START_T = 0. Does not reach axis (7)	9.2
FOCUS = YES (24)	12.1	INTER = No confrontation (26)	7.6	T_STOP = T complete (27)	8.8
T_STOP = T complete (27)	11.8	GRP_S2 = Grouped (16)	7.3	PHASE_2 = YES (28)	8.8
PHASE_2 = YES (28)	11.8	POS_S1 = Circles around the flock (11)	7.2	GRP_S1 = Grouped (13)	7.9
DST_W = Close to TL (21)	11.7	DST_W = Far from TL (21)	7.0	FOCUS = YES (24)	7.6
SEC15_T = Close to TL (10)	11.2	START_W = Goes to flock (17)	6.8	GRP_W = Grouped (20)	7.3
GRP_S2 = Grouped (16)	11.0	POS_S2 = Circles around the flock (14)	6.5	GRP_S2 = Grouped (16)	7.2
POS_S1 = Position in relation to TL (11)	10.7	GRP_W = Grouped (20)	6.2	POS_S2 = Position in relation to TL (14)	6.0
START_T = 2. Exceeds axis (7)	10.6	START_T = 3. Circles around the flock (7)	5.7	AXIS_S1 = 0. Does not reach axis (12)	5.7
GRP_T = Flock grouped (9)	10.5	MOV_W = Circles around (18)	5.6	INTER = No confrontation (26)	5.7
FLW_W = Follows (19)	10.3	GRP_S1 = Grouped (13)	5.5	START_W = Goes to flock (17)	5.3
GRP_S1 = Grouped (13)	10.1	SEC15_T = Close to TL (10)	5.5	DST_W = Close to TL (21)	5.2
MOV_W = Flanks (18)	9.1	FOCUS = YES (24)	5.3	PHASE_1 = YES (6)	4.9
INTER = No confrontation (26)	8.2	FOCUS = NO (24)	5.3	PT_STOP = PT complete (5)	4.8
AXIS_S2 = 1. Reaches axis (15)	7.4	PHASE_1 = YES (6)	5.2	FLW_W = Passes TL (19)	4.6
AXIS_S1 = 2. Exceeds axis (12)	7.2	PT_STOP = PT complete (5)	5.1	START_W = Motionless (17)	4.2
AXIS_S2 = 2. Exceeds axis (15)	7.1	POS_S1 = Bites, Hits or Chases a sheep (11)	5.0	MOV_W = Circles around and goes straight (18)	4.1
MOV_T = At least 1 sheep moving (8)	6.9	MOV_T = At least 1 sheep moving (8)	4.8
MOV_T = No movement (8)	6.7	START_W = Circles around the flock (17)	4.8
PHASE_1 = YES (6)	6.6	GRP_W = Split (20)	4.7
PT_STOP = PT complete (5)	6.5	PT2 = Spontaneously goes to the flock (4)	4.2
AXIS_S1 = 1. Reaches axis (12)	6.0	MOV_W = Flanks and circles around (18)	4.1
FLW_W = Passes TL (19)	4.8
PT2 = Spontaneously goes to the flock (4)	4.6
PT1 = 2. Exceeds axis (3)	4.1

Open in a new tab

Table 13.

Description of clusters 8 to 10 from field trial performances generated by HCPC: Names, HD numbers, and categories of items (item no.) which characterize the cluster with their associated v-test

Cluster 8		Cluster 9		Cluster 10
Test stopped: lack of motivation	v-test	PT stopped: predatory behavior	v-test	PT stopped: no motivation	v-test
N = 44	v-test	N = 40	v-test	N = 58	v-test
GRP_T = Not filled (9)	13.9	T_STOP = No test (27)	11.6	PT_STOP = HD tries/Leaves the plot (5)	16.1
SEC15_T = Not filled (10)	13.6	MOV_W = No test (18)	11.6	T_STOP = No test (27)	14.5
POS_S1 = Not filled (11)	11.1	INTER = No test (26)	11.6	MOV_W = No test (18)	14.5
GRP_S1 = Not filled (13)	10.9	MOTIV = No test (25)	11.6	INTER = No test (26)	14.5
INTER = Not filled (26)	10.7	FOCUS = No test (24)	11.6	MOTIV = No test (25)	14.5
MOTIV = Not filled (25)	10.7	DST_W = No test (21)	11.6	FOCUS = No test (24)	14.5
FOCUS = Not filled (24)	10.7	GRP_W = No test (20)	11.6	DST_W = No test (21)	14.5
DST_W = Not filled (21)	10.7	FLW_W = No test (19)	11.6	GRP_W = No test (20)	14.5
GRP_W = Not filled (20)	10.7	START_W = No test (17)	11.6	FLW_W = No test (19)	14.5
FLW_W = Not filled (19)	10.7	GRP_S2 = No test (16)	11.6	START_W = No test (17)	14.5
START_W = Not filled (17)	10.4	AXIS_S2 = No test (15)	11.6	GRP_S2 = No test (16)	14.5
T_STOP = HD loses interest (27)	9.9	POS_S2 = No test (14)	11.6	AXIS_S2 = No test (15)	14.5
GRP_S2 = Not filled (16)	9.8	GRP_S1 = No test (13)	11.6	POS_S2 = No test (14)	14.5
POS_S2 = Not filled (14)	9.8	AXIS_S1 = No test (12)	11.6	GRP_S1 = No test (13)	14.5
MOV_W = Not filled (18)	9.6	POS_S1 = No test (11)	11.6	AXIS_S1 = No test (12)	14.5
MOV_T = Not filled (8)	8.9	SEC15_T = No test (10)	11.6	POS_S1 = No test (11)	14.5
START_T = Does not go to the flock (does something else) (7)	8.5	GRP_T = No test (9)	11.6	SEC15_T = No test (10)	14.5
AXIS_S2 = Not filled (15)	7.9	MOV_T = No test (8)	11.6	GRP_T = No test (9)	14.5
PHASE_2 = NO (28)	7.8	START_T = No test (7)	11.6	MOV_T = No test (8)	14.5
AXIS_S1 = Not filled (12)	7.3	PHASE_1 = NO (6)	11.6	START_T = No test (7)	14.5
T_STOP = HD tries/Leaves the plot (27)	5.2	PT_STOP = At least 1 chase (5)	8.9	PHASE_1 = NO (6)	14.5
START_T = Tries/Leaves the plot (7)	5.0	PHASE_2 = NO (28)	7.3	PT2 = Not concerned (4)	12.6
START_T = Does not go to the flock (no movement) (7)	4.6	PT_STOP = Disturbs flock (5)	7.2	PT1 = Tries/Leaves the plot (3)	10.6
PHASE_1 = YES (6)	4.3	PT1 = 0. Does not reach axis (3)	4.4	PHASE_2 = NO (28)	9.1
PT_STOP = PT complete (5)	4.3	PT_STOP = Repeated predatory bites (5)	4.1	BT2 = Pulls back, resists (2)	5.4
				BITE = NO (22)	4.4
				PT1 = Does not go to the flock (does something else) (3)	4.2

Open in a new tab

Table 12.

Description of clusters 5 to 7 from field trial performances generated by HCPC: Names, HD numbers, and categories of items (item no.) which characterize the cluster with their associated v-tests

Cluster 5		Cluster 6		Cluster 7
Complete trial: limited keenness	v-test	Test stopped: too many predatory bites	v-test	Test stopped: predatory behavior	v-test
N = 22	v-test	N = 18	v-test	N = 72	v-test
POS_S1 = Still or doing something else (11)	8.6	T_STOP = Repeated predatory bites (27)	7.8	INTER = Not filled (26)	14.4
MOTIV = NO (25)	8.4	BITE = YES (22)	7.3	MOTIV = Not filled (25)	14.4
POS_S2 = Still or doing something else (14)	7.5	POS_S1 = Bites, Hits or Chases a sheep (11)	7.1	FOCUS = Not filled (24)	14.4
FLW_W = Does not follow (19)	7.3	INTER = Not filled (26)	6.5	DST_W = Not filled (21)	14.4
START_W = Loses interest (17)	6.8	MOTIV = Not filled (25)	6.5	GRP_W = Not filled (20)	14.4
DST_W = Far from TL (21)	6.7	FOCUS = Not filled (24)	6.5	FLW_W = Not filled (19)	14.4
T_STOP = T complete (27)	5.3	DST_W = Not filled (21)	6.5	START_W = Not filled (17)	14.3
PHASE_2 = YES (28)	5.3	GRP_W = Not filled (20)	6.5	POS_S2 = Not filled (14)	13.7
GRP_W = Grouped (20)	5.0	FLW_W = Not filled (19)	6.5	GRP_S2 = Not filled (16)	13.4
GRP_S2 = Grouped (16)	4.5	POS_S2 = Bites, Hits or Chases a sheep (14)	6.1	MOV_W = Not filled (18)	12.8
GRP_S1 = Grouped (13)	4.1	MOV_W = Not filled (18)	5.8	AXIS_S2 = Not filled (15)	10.4
POS_S1 = Still or doing something else (11)	8.6	AXIS_S2 = Not filled (15)	5.1	GRP_S1 = Not filled (13)	10.3
		START_W = Not filled (17)	4.9	PHASE_2 = NO (28)	10.3
		PHASE_2 = NO (28)	4.7	POS_S1 = Not filled (11)	9.9
		GRP_S1 = Grouped (13)	4.2	SEC15_T = Far from TL (10)	8.4
				AXIS_S1 = Not filled (12)	8.4
				T_STOP = Unworkable test (27)	7.8
				T_STOP = At least 1 chase (27)	7.2
				GRP_T = Flock split (9)	7.0
				START_T = 0. Does not reach axis (7)	6.4
				PHASE_1 = YES (6)	5.8
				PT_STOP = PT complete (5)	5.8
				MOV_W = Not filled (8)	4.1

Open in a new tab

The RF model accurately predicted 93.5% (ACC) of the 92 HDs in these 10 clusters using the Idele scoring system.

Discussion

Assessment of the feasibility of the field trial

The field trial implemented in this study successfully achieved its objectives in terms of simplicity and short duration. The test was conducted with minimal requirement for personnel and equipment, making it feasible using common farm equipment such as portable netting to enclose the trial area and a trailer to record videos from an elevated viewpoint. Testing each HD took less than 4 min, which made the setting up of the field trial easy to put in place for a relatively large number of HDs at any one venue.

Since AP/NAP phenotype is binary in nature, its assessment was not overly complex. However, directly applying the Idele scoring system on the field during the trial was not feasible because the substantial number of items involved, and the fact that not all items could be scored simultaneously (separate, successive, or overlapping phases), meant that examiners had to fill in their score by carefully watching videos, which often required multiple viewings. While setting up the trial itself was straightforward, accurately using the Idele scoring system without pausing the video or consulting the scoring rules for each item required a significant investment in time and comprehension.

AP/NAP definition and prediction

The definition of AP was obtained using the expertise of two experienced professionals with several decades of working with HDs. This definition of AP was compared with the herding traits described in the available literature and discussed with vets specializing in canine behavior and with members of the FUCT, the AFBC, and the SCC. These organisms will be involved in setting up the future HD breeding program and defining its goals. While the opinion of the two experienced professionals may not fully correspond to all the requirements of French farmers, their expertise in recognizing HDs exhibiting AP, and their AP/NAP assessments which are used as the Gold Standard throughout the analyses, are an invaluable proposition.

It is essential that HD breeders have faith in this method (i.e., definition of AP, field trial, Idele scoring system) because if not, they might question the importance of collecting data, reducing the number of HDs assessed, which in turn would have negative consequences for the resulting genetic analyses. The support of the HD sector in this project is essential.

The Idele scoring system was successful in predicting AP/NAP phenotype. The accuracy, sensitivity, and specificity were all above 90%, which is satisfactory considering the inherent difficulty of predicting a phenotype based on multiple items, particularly when the phenotype is relatively rare (16.5% in our case) (Adhikari et al., 2021). The first objective of the study was achieved because the experts’ assessments were quantified using the Idele scoring system.

AP characterization

Another objective of this study was to provide a precise characterization of AP via the Idele scoring system. While partial dependencies gave a clear and understandable definition of AP, the size and construction of the Idele scoring system made it impractical to use a multivariate prediction model or to calculate odds ratios. The numerous items involved and their intercorrelations made it challenging to derive specific rules for determining AP/NAP.

The interpretation of partial dependencies allowed to define an AP dog as an HD which completed both PT and T phases, exceeded axis while the flock was not moving at the beginning of T phase, maintained the flock in a group close to TL for 15 s, was positioned according to TL during both rotation phases, was able to move the flock close to TL without splitting it, stayed focused and motivated throughout the whole T, without uncontrolled biting or unnecessarily confronting the animals.

These results were obtained from the analysis of 460 HDs. However, to establish a precise definition with comprehensive guidelines for assessing AP/NAP based on the Idele scoring system, a larger number of HDs will need to be tested. These results are encouraging in the context of HD selection because the items observed align with traits that are recommended for consideration when assessing HDs for selective breeding (Arvelius et al., 2013). Out-run, natural ability and power, as defined in Arvelius et al. (2013), are very important in the definition of AP, respectively, through the notions of “exceeding axis”, “maintaining flock in a group” and “being able to move the flock”.

Environmental factors associated with AP

Sex and age have no influence on AP/NAP assessment in any significant way. Other studies showed little or no sex effect on HDs’ performance (Isnard, 2005). Some traits in BCs were examined at 6, 12, and 18 to 24 mo of age by Riemer et al (2016): fearfulness, aggression toward people, responsiveness to training, and aggression toward animals. They increased with age in this study, particularly between 6 and 12 mo. The consistency of AP/NAP at several age could not be estimated because the HDs were only tested once.

The effect of the field trial session on AP/NAP could be attributed to three factors. (i) The field trials were conducted under various weather conditions; however, these conditions were not recorded and therefore could not be estimated. Two studies did show a significant effect of weather on working dog performance during trials (in this case hunting dogs): wind/rain (Karjalainen et al., 1996) or snow (Liinamo et al., 1997). None of our trials were conducted in snowy conditions, and the field trial sessions were predominantly scheduled during spring and autumn to avoid unfavorable periods. Weather conditions could have had an impact on HD performance during our field trial sessions, but this effect could not be tested. (ii) A significant variation in AP/NAP proportions was observed between different locations, resulting in some sessions having no AP HDs tested while others had up to 50% AP HDs among the tested HDs. This observation could potentially be attributed to a maternal effect, the significance of which in HD performance is acknowledged (Courreau, 1991). Courreau (1991) highlighted the influence of the environment on dogs’ performance in competition, particularly in relation to environmental conditions both during pregnancy and within the litter. The maternal effect may explain the observation that HDs from the same litter were predominantly tested within the same location. In consequence, this could create a “lineage effect” and contribute to a greater similarity of behaviors within a particular location. Paternal and maternal permanent environmental random effects could not be studied due to the low number of offspring tested per father or mother (mostly one offspring per parent). Given the diverse utilizations of HDs in France, and the variations in type of livestock (species, breeds, and flock managements), disparities in HD behaviors between locations may become characterized, with certain HDs better suited to excel in our field trial than others. Further research is necessary to investigate this effect and determine whether our study is influenced by a bias associated with the lineage effect. The flocks used for testing the HDs could contribute to variations in their performance. Our field trial required consistency in behavior across flocks, balancing animals that were neither too unfamiliar nor too experienced with the testing protocol. It is worth noting that sheep have the ability to learn and engage in executive cognitive tasks, as demonstrated in a previous study (Morton and Avanzo, 2011). During a field trial, the same flock was used for multiple HDs within a session. Interestingly, as the number of tested HD increased, the sheep spontaneously began exhibiting behaviors aimed at avoiding confrontation with the HD, such as moving toward the TL without attempting to escape. This made the task easier for the HDs. However, in certain sessions, some flocks displayed more aggressive behaviors toward the HDs and engaged in frequent confrontations, which may have caused greater challenges for the tested HDs with those flocks. This disparity in the behavior of flocks during the trials was expected from the outset of the project, but no feasible solution was found to prevent it. Using a single flock for testing all HDs would have created an “expert” flock that was highly accustomed to human interaction, as a consequence obscuring the true impact of the HD on the animals. Assessing the suitability of a flock before organizing a session using a Gold Standard HD could have been considered. However, due to the difficulties of organizing sessions, they were held on farms that were willing to host a full day of HD trials, and flocks could not be individually chosen for each session.

The results indicated that HDs were more likely to be AP if their owner had previous experience with another HD. This finding aligns with a study that suggests dogs owned by individuals with less experience are more prone to exhibiting behavioral issues such as dominant-type aggression and excitability (Stephen and Ledger, 2007). Additionally, research has shown that the behaviors of working dogs (such as the canine unit of the Paris Firefighters Brigade) are influenced by their owner’s personality (Hoummady et al., 2016). The observation of an “owner effect” on HD performance therefore comes as no surprise. However, further studies specifically focusing on HDs are needed to gain a better understanding of this observation.

To the best of our knowledge, no study has been conducted on the influence of HD’s initial contact with livestock on herding performance. Four hypotheses can be proposed to explain the finding that AP HDs were more likely to have had their first contact with livestock in an unrestricted environment rather than on leash or with fences. 1) It is possible that this freedom during the initial contact reflects a natural tendency in owners to allow more autonomy in herding behaviors throughout the HD’s career, resulting in better scores in our study. To verify this hypothesis, a study similar to the one mentioned earlier (Hoummady et al., 2016) could be conducted to examine if the owner’s personality has long-term impacts on HD performance. 2) AP HDs could be affected by the initial contact conditions with livestock by the fact that owners who allowed unrestricted contact were also those with better criteria for choosing puppies. Due to limited variability in the choice criteria reported by owners in the questionnaire, this hypothesis could not be tested and may require a larger sample size of HDs. 3) It can be assumed that for a young dog, learning under unrestricted conditions is less stressful and more conducive to their well-being compared to learning under constraints. The freedom during the initial contact may enable the HD to adapt more easily to the novelty of interacting with a flock at a few months old. This positive reinforcement-based learning approach could help reduce fear and lack of keenness. Two studies examining the impact of training methods on canine behavior are in accordance with these hypotheses. The first study found that positive reinforcement yielded significantly better responses to commands such as “Sit” and “Come” compared to electric collar training (China et al., 2020). The second study concluded that positive reinforcement is less stressful and potentially promotes better well-being for dogs compared to negative reinforcement (Deldalle and Gaunet, 2014). 4) It is possible that owners of HDs with better herding abilities are more inclined to allow their HDs to freely interact with flocks because they have less concern about the HDs biting animals or scattering the flock. In our case, training HDs in an unrestricted environment could potentially lead to better results when compared to constraints such as fences or leashes.

The observation that HDs accustomed to working with sheep were more likely to score AP compared to those working with cattle could be attributed to their being used to them. To the best of our knowledge, no study has been conducted on the differences in herding performance based on the species of livestock. Therefore, we can only speculate that HDs may perform better in our field trial when working with the animals they are accustomed to. To further investigate this, a similar trial could be conducted with cattle to determine if HDs familiar with working this particular species demonstrate better performance.

Inter-evaluator agreement

All items, except for two, exhibited a minimum of “substantial” agreement, and the overall correlation between the three evaluators reached a level of 0.70, which is also considered “substantial” according to Landis and Koch (1977). This finding is encouraging, considering that the Idele scoring system consists of 28 qualitative items, and that two of the evaluators had limited training and no prior experience with HDs. This agreement could be improved by evaluators’ training courses. The current format of the Idele scoring system could be used by individuals who had attended these training courses to assess HDs, regardless of their expertise in HDs behavior. This result aligns with a previous study that examined the ability of individuals with varying levels of experience with dogs to interpret canine behaviors on videos, which found no significant difference between the different groups of individuals (Tami and Gallagher, 2009).

10 clusters

Nine out of the ten generated clusters appear to have meaningful interpretations from a herding behavior perspective. Cluster 4 consisting of HDs which faced reluctant sheep, could not be retained because of the small number of HDs in it. It is worth noting that four out of the five HDs included in this cluster were tested in the same field trial session, suggesting a strong flock effect that may have influenced the test conditions and compromised the reliability of the cluster.

Some items included in our trial could be compared to traits presented by Arvelius et al. (2013), whose heritabilities were estimated moderate, with for instance 0.33 for out-run, 0.48 for natural ability and 0.30 for power. This leads us to assume that our AP could be relevant in an HD breeding program, but these results should be confirmed with genetics studies (estimation of heritability and breeding values). However, given the low proportion of AP HDs and the heterogeneity in NAP HDs’ clusters (such as lack of natural ability or keenness, and predatory behavior), it could be more interesting to consider using HDs clusters instead of AP/NAP in a breeding program perspective. The variations observed in NAP classification may be genetically linked by different underlying causes, and it is possible that certain clusters possess heritable items significant enough to enable selective breeding. These clusters could be assessed using six ratings calculated for the six herding behaviors included in the definition of AP (HD’s keenness when interacting with animals, HD’s focus on the targeted animals, HD’s interaction with handlers in the presence of the test flock, HD’s predatory behavior during the test, HD’s abilities to contain the test flock for the handlers, and HD’s ability to move the test flock for the handlers), as presented in this article. These ratings could be derived from numerical scores assigned to the items of the Idele scoring system which are relevant to each of these six specific herding behaviors. Further analysis is required to ensure that these ratings accurately predict a cluster and that these clusters are meaningful in the context of an HD breeding program.

Selection perspective

Lord et al. (2016) demonstrates that the selection of HD breeds has been based on their herding behaviors, which diminish when they are not bred selectively. This highlights the genetic component in the transmission of these traits.

Variation of behaviors between breeds is reported. One study shows for instance that the amount of aggression directed toward strangers, owners, and dogs is statistically different between 30 breeds (Duffy et al., 2008), and another one suggests big differences between breeds as far as playfulness, curiosity/fearlessness, sociability and aggressiveness are concerned (Svartberg, 2006). Behavioral variations among HD breeds are frequently mentioned in different studies. For example, BC are known for their high energy and focused working style, characterized by the distinctive eye and stalking behaviors, while breeds like the Beauceron or Pyrenean Sheepdog may exhibit less precocity and are more inclined to trot rather than sprint (Conessa, 2013). These breed-specific differences are often debated among HDs users and conducting our field trial with diverse breeds of HDs would provide valuable insights to either support or challenge these arguments.

The field trial and the scoring system developed might provide bases for selection for AP. It would be valuable to conduct genetic analyses (estimation of heritability and breeding values) to see whether genetic evaluation for herding traits would be feasible in a selective breeding context. Although our results were based on 460 performances, which were sufficient to validate the Idele scoring system, but this number is not enough for conducting genetic studies. Courreau (1998) suggests that several thousand performances are needed to perform quantitative genetic studies in HDs. Thus, further testing would be required, involving a larger number of HDs with sufficient common ancestors (known pedigrees), to reach a minimum of 1,000 HDs tested to establish a robust genetic foundation. It would be interesting to do genomic analyses to identify variants associated with herding traits. At the present time, no genes responsible for herding behaviors have been identified either across breeds or specifically in BCs. However, loci associated with various traits, including fear-memory formation, pain perception, behavioral excitability, and morphology, have been identified on different chromosomes in Working Australian Kelpies (Arnott et al., 2015).

Conclusion

All three aims of this study have been reached. First, the new field trial using the Idele scoring system, presented in this paper, detected AP and NAP HDs with an accuracy of over 93%. Secondly, it was found that AP could be characterized through the Idele scoring system, after testing 460 BCs, with a RFE run on all features existing in the scoring system. Thirdly, different environmental effects on AP/NAP were studied, and four of them were statistically significant (the field trial session, owner’s prior experience with HD, conditions of the dog’s initial contact with livestock, the specific livestock with which the HD was accustomed to working). These results are encouraging for an HD breeding program based on AP. For selective breeding, different issues still need to be addressed, such as having sufficient data, the estimation of genetic parameters and breeding values.

Supplementary Material

skae157_suppl_Supplementary_Materials

skae157_suppl_supplementary_materials.docx^{(56.8KB, docx)}

Acknowledgments

The authors thank Idele and EnvA colleagues (Dr. Marine Driant) for their participation in the project. They profusely thank Alisson Stocchetti for her help regarding statistical matters all along the project. They also thank the other partners involved in this project: FUCT, AFBC, INRAE, Cani-DNA, and SCC. The departmental associations of HD users are acknowledged for their investment throughout the project in recruiting dogs and allowing trial sessions to be held, as well as all the farmers who participated in these sessions. The authors thank Laurent Journaux (currently Director of France Génétique Elevage) for believing in this project and his contribution to its achievement. Gillian Howell Hugo (AFBC) is acknowledged for proofreading this article. Institut Carnot France Futur Elevage and the French Ministry of agriculture are acknowledged for their financial support.

Glossary

Abbreviations

ACC: accuracy
AFBC: Association Française du Border Collie (French Border Collie Association)
AP: adequate phenotype
BC: Border Collie
BT: before testing
CRB: Biological Resource Center, Cani-DNA
EBVs: estimated breeding values
EnvA: Ecole nationale vétérinaire d’Alfort (National veterinary school of Alfort)
FUCT: Fédération des Utilisateurs de Chiens de Troupeaux (Federation of Herding dog Users)
h²: heritability
HCPC: hierarchical clustering on principal components
HD: herding dog
HTC: herding trait characterization
Idele: Institut de l’Elevage (French livestock institute)
INRAE: National Research Institute for Agriculture, Food and Environment
κ: Fleiss kappa
LOF: Livre des Origines Français (French stud-book)
MCA: Multiple Correspondence Analysis
NAP: non-adequate phenotype
PT: pre-test
r: repeatability
RF: random forest
RFE: recursive feature elimination
SCC: Société Centrale Canine (French Kennel Club affiliated to FCI Federation Cynologique Internationale)
Se: sensitivity
Sp: specificity
T: test
TL: test leader

Contributor Information

Boris Lasserre, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Barbara Ducreux, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Marjorie Chassier, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Louise Joly, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Pascal Cacheux, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Thierry Le Morzadec, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Stéphanie Dayde-Fonda, Department of Genetics, Institut de l’Elevage, Lyon 69007, France.

Caroline Gilbert, Ecole Nationale Vétérinaire, Maisons-Alfort 94704, France.

Conflict of interest statement

No conflicts of interest have been declared by the authors. The responsibility of the Ministry of Agriculture cannot be engaged.

Literature cited

Adhikari, S., Normand S. -L., Bloom J., Shahian D., and Rose S... 2021. Revisiting performance metrics for prediction with rare outcomes. Stat. Methods Med. Res. 30:2352–2366. doi: 10.1177/09622802211038754 [DOI] [PMC free article] [PubMed] [Google Scholar]
Aprilliant, A. 2021. Cohen’s kappa and Fleiss’ kappa—how to measure the agreement between raters. Medium. Available from: https://audhiaprilliant.medium.com/cohens-kappa-and-fleiss-kappa-how-to-measure-the-agreement-between-raters-9ec12edef121 [Google Scholar]
Arnott, E., Early J., Wade C., and McGreevy P... 2014. Estimating the economic value of Australian stock herding dogs. Anim. Welf. 23:189–197. doi: 10.7120/09627286.23.2.189 [DOI] [Google Scholar]
Arnott, E. R., Peek L., Early J. B., Pan A. Y. H., Haase B., Chew T., McGreevy P. D., and Wade C. M... 2015. Strong selection for behavioural resilience in Australian stock working dogs identified by selective sweep analysis. Canine Genet. Epidemiol. 2:6. doi: 10.1186/s40575-015-0017-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Arvelius, P., Malm S., Svartberg K., and Strandberg E... 2013. Measuring herding behavior in Border collie—effect of protocol structure on usefulness for selection. J. Vet. Behav. 8:9–18. doi: 10.1016/j.jveb.2012.04.007 [DOI] [Google Scholar]
Breiman, L. 2001. Random forests. Mach. Learn. 45:5–32. doi: 10.1023/A:1010933404324 [DOI] [Google Scholar]
Bulut, O. 2021. Effective feature selection: recursive feature elimination using R. Medium. Available from: https://towardsdatascience.com/effective-feature-selection-recursive-feature-elimination-using-r-148ff998e4f7 [Google Scholar]
China, L., Mills D. S., and Cooper J. J... 2020. Efficacy of dog training with and without remote electronic collars vs. a focus on positive reinforcement. Front. Vet. Sci 7:508. doi: 10.3389/fvets.2020.00508 [DOI] [PMC free article] [PubMed] [Google Scholar]
Conessa, J. 2013. Les chiens de troupeau en France au début du XXIème siècle. Ecole Nationale Vétérinaire d’Alfort. Available from: https://theses.vet-alfort.fr/telecharger.php?id=1666 [Google Scholar]
Coppinger, R., and Coppinger L... 2001. Dogs: a startling new understanding of Canine origin, behavior & evolution. [Google Scholar]
Coren, S. 1994. The intelligence of dogs. New York: Toronto: New York: Free Press; Maxwell Macmillan Canada; Maxwell Macmillan International [Google Scholar]
Courreau, J. 1991. Les perspectives en sélection du chien de sport. Recl. Médecine Vét. 667–672 [Google Scholar]
Courreau, J. 1998. Déterminisme génétique des aptitudes au travail chez le chien et perspectives pour la sélection de l’espèce. Bull. Académie Vét. Fr. 151:217–224. doi: 10.4267/2042/63643 [DOI] [Google Scholar]
Courreau, J. 2012. LE CHIEN DE CONDUITE - Un essai de caractérisation. Available from: https://www.chienbergerdauvergne.org/wp-content/uploads/2020/03/chienbergerdauvergne-le-chien-de-conduite-Jean_Franc%CC%A7ois_Courreau.pdf [Google Scholar]
Deldalle, S., and Gaunet F... 2014. Effects of 2 training methods on stress-related behaviors of the dog (Canis familiaris) and on the dog–owner relationship. J. Vet. Behav. 9:58–65. doi: 10.1016/j.jveb.2013.11.004 [DOI] [Google Scholar]
Ducreux, B. 2015. Chiens de conduite. Available from: https://idele.fr/chiens-de-troupeau/publications/detail-article?tx_atolidelecontenus_publicationdetail%5Baction%5D=showDossier&tx_atolidelecontenus_publicationdetail%5Bcontroller%5D=Detail&tx_atolidelecontenus_publicationdetail%5Bpublication%5D=570&cHash=e2a345c68845b0878129f630abc8c72e [Google Scholar]
Ducreux, B. n.d. Objectifs et actions - CANIDEA. Idele.fr. Available from: https://idele.fr/canidea/objectifs-et-actions [Google Scholar]
Duffy, D., Hsu Y., and Serpell J. A... 2008. Breed differences in canine aggression. Appl. Anim. Behav. Sci. 114:441–460. doi: 10.1016/j.applanim.2008.04.006 [DOI] [Google Scholar]
Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76:378–382. doi: 10.1037/h0031619 [DOI] [Google Scholar]
Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29:1189–1232. doi: 10.1214/aos/1013203451 [DOI] [Google Scholar]
Gholamy, A., Kreinovich V., and Kosheleva O... 2018. Why 70/30 or 80/20 relation between training and testing sets: A pedagogical explanation. Dep. Tech. Rep. CS. 1209:7. Available from: https://scholarworks.utep.edu/cgi/viewcontent.cgi?article=2202&context=cs_techrep [Google Scholar]
Horn, S. S., Steinheim G., Olsen H. F., Gjerjordet H. F., and Klemetsdal G... 2017. Genetic analyses of herding traits in the Border Collie using sheepdog trial data. J. Anim. Breed. Genet. 134:144–151. doi: 10.1111/jbg.12234 [DOI] [PubMed] [Google Scholar]
Hoummady, S., Péron F., Grandjean D., Cléro D., Bernard B., Titeux E., Desquilbet L., and Gilbert C... 2016. Relationships between personality of human–dog dyads and performances in working tasks. Appl. Anim. Behav. Sci. 177:42–51. doi: 10.1016/j.applanim.2016.01.015 [DOI] [Google Scholar]
Husson, F., and Josse J.. 2014. Multiple Correspondence Analysis. In: Visualization and Verbalization of Data. Chapman and Hall/CRC. [Google Scholar]
Husson, F., Le S., and Pagès J... 2017. Exploratory multivariate analysis by example using R. 2nd ed. New York: Chapman and Hall/CRC [Google Scholar]
Husson, F., Le Ray G., and Molto Q... 2023. HCPC: hierarchical clustering on principle components (HCPC) in FactoMineR: multivariate exploratory data analysis and data mining. Available from: https://rdrr.io/cran/FactoMineR/man/HCPC.html [Google Scholar]
Isnard, J. 2005. Etude des paramètres génétiques des qualités de travail du Border collie, chien de troupeau. Available from: https://theses.vet-alfort.fr/telecharger.php?id=616 [Google Scholar]
Josse, J. 2010. Principal component methods—hierarchical clustering—partitional clustering: why would we need to choose for visualizing data? [Google Scholar]
Karjalainen, L., Ojala M., and Vilva V... 1996. Environmental effects and genetic parameters for measurements of hunting performance in the Finnish Spitz. J. Anim. Breed. Genet. 113:525–534. doi: 10.1111/j.1439-0388.1996.tb00641.x [DOI] [Google Scholar]
Kuhn, M., and Johnson K... 2013. Applied predictive modeling. Available from: https://www.researchgate.net/publication/267989574_Applied_Predictive_Modeling [Google Scholar]
Landis, J. R., and Koch G. G... 1977. The measurement of observer agreement for categorical data. Biometrics. 33:159–174. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
Liaw, A., and Wiener M... 2001. Classification and regression by randomForest. R News. 2:18–22. Available from: https://www.researchgate.net/publication/228451484_Classification_and_Regression_by_RandomForest [Google Scholar]
Liinamo, A. E., Karjalainen L., Ojala M., and Vilva V... 1997. Estimates of genetic parameters and environmental effects for measures of hunting performance in Finnish hounds. J. Anim. Sci. 75:622–629. doi: 10.2527/1997.753622x [DOI] [PubMed] [Google Scholar]
Lord, K., Schneider R. A., and Coppinger R.. 2016. Evolution of working dogs. In: Serpell J., editor. The domestic dog: Its evolution, behavior and interactions with people. 2nd ed.Cambridge University Press, Cambridge. p. 42–66. Available from: https://www.cambridge.org/core/books/domestic-dog/evolution-of-working-dogs/CC5083D37F741470DDFA69AFBB238AB1 [Google Scholar]
Morton, A. J., and Avanzo L... 2011. Executive decision-making in the domestic sheep. PLoS One 6:e15752. doi: 10.1371/journal.pone.0015752 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedamkar, P. 2019. GLM in R | Learn How to Construct Generalized Linear Model in R. EDUCBA. Available from: https://www.educba.com/glm-in-r/ [Google Scholar]
Riemer, S., Müller C., Virányi Z., Huber L., and Range F... 2016. Individual and group level trajectories of behavioural development in Border collies. Appl. Anim. Behav. Sci. 180:78–86. doi: 10.1016/j.applanim.2016.04.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruefenacht, S., Gebhardt-Henrich S., Miyake T., and Gaillard C... 2002. A behaviour test on German Shepherd dogs: heritability of seven different traits. Appl. Anim. Behav. Sci. 79:113–132. doi: 10.1016/s0168-1591(02)00134-x [DOI] [Google Scholar]
Stephen, J., and Ledger R... 2007. Relinquishing dog owners’ ability to predict behavioural problems in shelter dogs post adoption. Appl. Anim. Behav. Sci. 107:88–99. doi: 10.1016/j.applanim.2006.09.012 [DOI] [Google Scholar]
Svartberg, K. 2006. Breed-typical behaviour in dogs—Historical remnants or recent constructs? Appl. Anim. Behav. Sci. 96:293–313. doi: 10.1016/j.applanim.2005.06.014 [DOI] [Google Scholar]
Tami, G., and Gallagher A... 2009. Description of the behaviour of domestic dog (Canis familiaris) by experienced and inexperienced people. Appl. Anim. Behav. Sci. 120:159–169. doi: 10.1016/j.applanim.2009.06.009 [DOI] [Google Scholar]
Tibshirani, R., Walther G., and Hastie T... 2001. Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. Ser. B: Stat. Methodol. 63:411–423. doi: 10.1111/1467-9868.00293 [DOI] [Google Scholar]
Van der Waaij, E. H., Wilsson E., and Strandberg E... 2008. Genetic analysis of results of a Swedish behavior test on German Shepherd dogs and Labrador Retrievers. J. Anim. Sci. 86:2853–2861. doi: 10.2527/jas.2007-0616 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

skae157_suppl_Supplementary_Materials

skae157_suppl_supplementary_materials.docx^{(56.8KB, docx)}

[CIT0001] Adhikari, S., Normand S. -L., Bloom J., Shahian D., and Rose S... 2021. Revisiting performance metrics for prediction with rare outcomes. Stat. Methods Med. Res. 30:2352–2366. doi: 10.1177/09622802211038754 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0002] Aprilliant, A. 2021. Cohen’s kappa and Fleiss’ kappa—how to measure the agreement between raters. Medium. Available from: https://audhiaprilliant.medium.com/cohens-kappa-and-fleiss-kappa-how-to-measure-the-agreement-between-raters-9ec12edef121 [Google Scholar]

[CIT0003] Arnott, E., Early J., Wade C., and McGreevy P... 2014. Estimating the economic value of Australian stock herding dogs. Anim. Welf. 23:189–197. doi: 10.7120/09627286.23.2.189 [DOI] [Google Scholar]

[CIT0004] Arnott, E. R., Peek L., Early J. B., Pan A. Y. H., Haase B., Chew T., McGreevy P. D., and Wade C. M... 2015. Strong selection for behavioural resilience in Australian stock working dogs identified by selective sweep analysis. Canine Genet. Epidemiol. 2:6. doi: 10.1186/s40575-015-0017-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0005] Arvelius, P., Malm S., Svartberg K., and Strandberg E... 2013. Measuring herding behavior in Border collie—effect of protocol structure on usefulness for selection. J. Vet. Behav. 8:9–18. doi: 10.1016/j.jveb.2012.04.007 [DOI] [Google Scholar]

[CIT0006] Breiman, L. 2001. Random forests. Mach. Learn. 45:5–32. doi: 10.1023/A:1010933404324 [DOI] [Google Scholar]

[CIT0007] Bulut, O. 2021. Effective feature selection: recursive feature elimination using R. Medium. Available from: https://towardsdatascience.com/effective-feature-selection-recursive-feature-elimination-using-r-148ff998e4f7 [Google Scholar]

[CIT0008] China, L., Mills D. S., and Cooper J. J... 2020. Efficacy of dog training with and without remote electronic collars vs. a focus on positive reinforcement. Front. Vet. Sci 7:508. doi: 10.3389/fvets.2020.00508 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0009] Conessa, J. 2013. Les chiens de troupeau en France au début du XXIème siècle. Ecole Nationale Vétérinaire d’Alfort. Available from: https://theses.vet-alfort.fr/telecharger.php?id=1666 [Google Scholar]

[CIT0010] Coppinger, R., and Coppinger L... 2001. Dogs: a startling new understanding of Canine origin, behavior & evolution. [Google Scholar]

[CIT0012] Coren, S. 1994. The intelligence of dogs. New York: Toronto: New York: Free Press; Maxwell Macmillan Canada; Maxwell Macmillan International [Google Scholar]

[CIT0013] Courreau, J. 1991. Les perspectives en sélection du chien de sport. Recl. Médecine Vét. 667–672 [Google Scholar]

[CIT0014] Courreau, J. 1998. Déterminisme génétique des aptitudes au travail chez le chien et perspectives pour la sélection de l’espèce. Bull. Académie Vét. Fr. 151:217–224. doi: 10.4267/2042/63643 [DOI] [Google Scholar]

[CIT0015] Courreau, J. 2012. LE CHIEN DE CONDUITE - Un essai de caractérisation. Available from: https://www.chienbergerdauvergne.org/wp-content/uploads/2020/03/chienbergerdauvergne-le-chien-de-conduite-Jean_Franc%CC%A7ois_Courreau.pdf [Google Scholar]

[CIT0016] Deldalle, S., and Gaunet F... 2014. Effects of 2 training methods on stress-related behaviors of the dog (Canis familiaris) and on the dog–owner relationship. J. Vet. Behav. 9:58–65. doi: 10.1016/j.jveb.2013.11.004 [DOI] [Google Scholar]

[CIT0017] Ducreux, B. 2015. Chiens de conduite. Available from: https://idele.fr/chiens-de-troupeau/publications/detail-article?tx_atolidelecontenus_publicationdetail%5Baction%5D=showDossier&tx_atolidelecontenus_publicationdetail%5Bcontroller%5D=Detail&tx_atolidelecontenus_publicationdetail%5Bpublication%5D=570&cHash=e2a345c68845b0878129f630abc8c72e [Google Scholar]

[CIT0018] Ducreux, B. n.d. Objectifs et actions - CANIDEA. Idele.fr. Available from: https://idele.fr/canidea/objectifs-et-actions [Google Scholar]

[CIT0019] Duffy, D., Hsu Y., and Serpell J. A... 2008. Breed differences in canine aggression. Appl. Anim. Behav. Sci. 114:441–460. doi: 10.1016/j.applanim.2008.04.006 [DOI] [Google Scholar]

[CIT0020] Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76:378–382. doi: 10.1037/h0031619 [DOI] [Google Scholar]

[CIT0021] Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29:1189–1232. doi: 10.1214/aos/1013203451 [DOI] [Google Scholar]

[CIT0022] Gholamy, A., Kreinovich V., and Kosheleva O... 2018. Why 70/30 or 80/20 relation between training and testing sets: A pedagogical explanation. Dep. Tech. Rep. CS. 1209:7. Available from: https://scholarworks.utep.edu/cgi/viewcontent.cgi?article=2202&context=cs_techrep [Google Scholar]

[CIT0023] Horn, S. S., Steinheim G., Olsen H. F., Gjerjordet H. F., and Klemetsdal G... 2017. Genetic analyses of herding traits in the Border Collie using sheepdog trial data. J. Anim. Breed. Genet. 134:144–151. doi: 10.1111/jbg.12234 [DOI] [PubMed] [Google Scholar]

[CIT0024] Hoummady, S., Péron F., Grandjean D., Cléro D., Bernard B., Titeux E., Desquilbet L., and Gilbert C... 2016. Relationships between personality of human–dog dyads and performances in working tasks. Appl. Anim. Behav. Sci. 177:42–51. doi: 10.1016/j.applanim.2016.01.015 [DOI] [Google Scholar]

[CIT0025] Husson, F., and Josse J.. 2014. Multiple Correspondence Analysis. In: Visualization and Verbalization of Data. Chapman and Hall/CRC. [Google Scholar]

[CIT0026] Husson, F., Le S., and Pagès J... 2017. Exploratory multivariate analysis by example using R. 2nd ed. New York: Chapman and Hall/CRC [Google Scholar]

[CIT0027] Husson, F., Le Ray G., and Molto Q... 2023. HCPC: hierarchical clustering on principle components (HCPC) in FactoMineR: multivariate exploratory data analysis and data mining. Available from: https://rdrr.io/cran/FactoMineR/man/HCPC.html [Google Scholar]

[CIT0028] Isnard, J. 2005. Etude des paramètres génétiques des qualités de travail du Border collie, chien de troupeau. Available from: https://theses.vet-alfort.fr/telecharger.php?id=616 [Google Scholar]

[CIT0029] Josse, J. 2010. Principal component methods—hierarchical clustering—partitional clustering: why would we need to choose for visualizing data? [Google Scholar]

[CIT0030] Karjalainen, L., Ojala M., and Vilva V... 1996. Environmental effects and genetic parameters for measurements of hunting performance in the Finnish Spitz. J. Anim. Breed. Genet. 113:525–534. doi: 10.1111/j.1439-0388.1996.tb00641.x [DOI] [Google Scholar]

[CIT0031] Kuhn, M., and Johnson K... 2013. Applied predictive modeling. Available from: https://www.researchgate.net/publication/267989574_Applied_Predictive_Modeling [Google Scholar]

[CIT0032] Landis, J. R., and Koch G. G... 1977. The measurement of observer agreement for categorical data. Biometrics. 33:159–174. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]

[CIT0033] Liaw, A., and Wiener M... 2001. Classification and regression by randomForest. R News. 2:18–22. Available from: https://www.researchgate.net/publication/228451484_Classification_and_Regression_by_RandomForest [Google Scholar]

[CIT0034] Liinamo, A. E., Karjalainen L., Ojala M., and Vilva V... 1997. Estimates of genetic parameters and environmental effects for measures of hunting performance in Finnish hounds. J. Anim. Sci. 75:622–629. doi: 10.2527/1997.753622x [DOI] [PubMed] [Google Scholar]

[CIT0044] Lord, K., Schneider R. A., and Coppinger R.. 2016. Evolution of working dogs. In: Serpell J., editor. The domestic dog: Its evolution, behavior and interactions with people. 2nd ed.Cambridge University Press, Cambridge. p. 42–66. Available from: https://www.cambridge.org/core/books/domestic-dog/evolution-of-working-dogs/CC5083D37F741470DDFA69AFBB238AB1 [Google Scholar]

[CIT0035] Morton, A. J., and Avanzo L... 2011. Executive decision-making in the domestic sheep. PLoS One 6:e15752. doi: 10.1371/journal.pone.0015752 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0036] Pedamkar, P. 2019. GLM in R | Learn How to Construct Generalized Linear Model in R. EDUCBA. Available from: https://www.educba.com/glm-in-r/ [Google Scholar]

[CIT0037] Riemer, S., Müller C., Virányi Z., Huber L., and Range F... 2016. Individual and group level trajectories of behavioural development in Border collies. Appl. Anim. Behav. Sci. 180:78–86. doi: 10.1016/j.applanim.2016.04.021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0038] Ruefenacht, S., Gebhardt-Henrich S., Miyake T., and Gaillard C... 2002. A behaviour test on German Shepherd dogs: heritability of seven different traits. Appl. Anim. Behav. Sci. 79:113–132. doi: 10.1016/s0168-1591(02)00134-x [DOI] [Google Scholar]

[CIT0039] Stephen, J., and Ledger R... 2007. Relinquishing dog owners’ ability to predict behavioural problems in shelter dogs post adoption. Appl. Anim. Behav. Sci. 107:88–99. doi: 10.1016/j.applanim.2006.09.012 [DOI] [Google Scholar]

[CIT0040] Svartberg, K. 2006. Breed-typical behaviour in dogs—Historical remnants or recent constructs? Appl. Anim. Behav. Sci. 96:293–313. doi: 10.1016/j.applanim.2005.06.014 [DOI] [Google Scholar]

[CIT0041] Tami, G., and Gallagher A... 2009. Description of the behaviour of domestic dog (Canis familiaris) by experienced and inexperienced people. Appl. Anim. Behav. Sci. 120:159–169. doi: 10.1016/j.applanim.2009.06.009 [DOI] [Google Scholar]

[CIT0042] Tibshirani, R., Walther G., and Hastie T... 2001. Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. Ser. B: Stat. Methodol. 63:411–423. doi: 10.1111/1467-9868.00293 [DOI] [Google Scholar]

[CIT0043] Van der Waaij, E. H., Wilsson E., and Strandberg E... 2008. Genetic analysis of results of a Swedish behavior test on German Shepherd dogs and Labrador Retrievers. J. Anim. Sci. 86:2853–2861. doi: 10.2527/jas.2007-0616 [DOI] [PubMed] [Google Scholar]

PERMALINK

Testing and characterization of herding dogs’ behaviors

Boris Lasserre

Barbara Ducreux

Marjorie Chassier

Louise Joly

Pascal Cacheux

Thierry Le Morzadec

Stéphanie Dayde-Fonda

Caroline Gilbert

Abstract

Introduction

Herding dogs in France

Herding dogs’ ability assessment methods

Aims

Materials and Methods

Field trials

Herding dogs tested

Table 1.

Field trial protocol

Table 2.

Figure 1.

Figure 2.

Table 3.

Table 4.

Table 5.

Idele scoring system creation and use

Supplementary data

Statistical analyses

Results

AP/NAP prediction

Table 6.

AP characterization

Table 7.

Table 8.

Table 9.

Environmental effects and their association with AP

Inter-evaluator agreement

Table 10.

HD clusters

Table 11.

Table 13.

Table 12.

Discussion

Assessment of the feasibility of the field trial

AP/NAP definition and prediction

AP characterization

Environmental factors associated with AP

Inter-evaluator agreement

10 clusters

Selection perspective

Conclusion

Supplementary Material

Acknowledgments

Glossary

Abbreviations

Contributor Information

Conflict of interest statement

Literature cited

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases