Skip to main content
PLOS One logoLink to PLOS One
. 2020 Jul 29;15(7):e0236271. doi: 10.1371/journal.pone.0236271

Behavior test for seven-week old puppies (Canis familiaris): Inter-rater reliability and factors associated with test performance

Daniela Alberghina 1,*, Fabiola Giunta 1, Mauro Gioè 2,3, Michele Panzera 1
Editor: Simon Clegg4
PMCID: PMC7390333  PMID: 32726318

Abstract

Behavioral development in domestic dogs has been investigated for predicting suitability for service dog work or for matching with the “right” families as well as for identifying predispositions to behavioral problems. Findings from the scientific literature seem to confirm that conducting behavioral tests at 7 weeks of age is too early to reliably predict the temperament and personality of a dog. However, this period for domestic dogs is sensitive for early life learning and conditions during this time could have important consequences in adulthood.

The aims of this study were to evaluate inter-rater reliability of a simple standardized test and to investigate which factors influence the behavioral reaction of puppies. 105 seven-week old puppies were exposed to five subtests: social attraction, following, retrieving, sudden appearance, noise. During each task, the behaviour of each pup was scored on a 3–5 point scale that reflected the suitability of the pup’s reaction to the task. Scores were evaluated for a single subtest and for two aggregate indicators (i.e. response to a person: social attraction subtest and following subtest and response to object and noise: retrieving subtest, sudden appearance subtest and noise subtest). Three assessors independently scored the dogs’ reactions for each task. Inter-rater reliability of the three assessors were analyzed with Fleiss’ Kappa and Kendall’s coefficient, which showed a high inter-rater reliability in 4 of 5 tasks. The ordered logistic regression was carried out to obtain a proportional odds model that was used to model the relationship between sex, litter size, stimulating environment, parity of mother, adequate maternal behavior and high scores. Litter size and maternal parity were associated with test performance in response to a person. The variance of effect of litter was high in response to object and noise. Taken together, our results suggest that using this scoring system there is sufficient inter-rater reliability in the test and litter size and mother experience influences task performances related to dog-human interaction.

Introduction

Behavioral development in domestic dogs has been investigated for matching puppies with the right families, identifying predispositions for behavioral problems at an early stage, and predicting suitability for working-dog organizations, which select dogs at a young age to train as service dogs (e.g. guide dog, hearing dog, medical alert dog, etc …). Puppy tests are typically aimed at investigating a variety of behavioural predispositions and often include interactions with unfamiliar people, play, exploration of novel environments or objects, and startle stimuli [1]. Puppy tests involve presenting a selection of tasks to puppies in a standardized manner, to allow for comparisons [2]. The period between 6 and 7 weeks of development may facilitate certain testing, since puppies haven’t fully developed the fear imprinting response and they can be more easily handled by unknown people [3]. The potential for evaluation of dog-human relationships or the predisposition to learn from humans at this age could provide insights for improving adoptions since the recommended age for putting puppies up for adoption are around 8 weeks of age [4]. Although weaning may occur from 4 to 6 weeks of age, a puppy should never be adopted before 7.5 to 8 weeks of age since clinical observations indicate that the interaction occurring within the litter at this time and the effect of the mother are critical to a puppy's development, and early removal from the litter may result in emotional instability [5].

There are still concerns over the lack of standardisation amongst research on dog behavioral tests [6]. Published literature on puppy tests reveals that there has been little consistency in the tasks used, the age of testing and the form of evaluation or validation [7]. Furthermore, the predictive validity of early tests for predicting specific behavioral traits in adult pet dogs is limited [1].

The behavioral assays of the puppy tests developed by Campbell and by Jack and Wendy Volhard are the most commonly used in practice for seven-week-old puppies [8]. The Volhards’ puppy aptitude test comprises of 10 subtests that incorporates tasks from the Campbell test [9] and from the Puppy Temperament Test [10]. It also includes an additional three tasks to test responses to touch, sound and the sudden opening of an umbrella [8]. In each subtest, puppies are scored on a scale from 1 to 6 depending on their behavioural response. This test has been rarely utilized in the scientific literature likely because the scoring method cannot be statistically analyzed but see Goleman et al. [11], Asher et al. [7], Majecka et al. [8] for a modified version of the test.

In the present study, we employed five subtests from the Volhard test: social attraction, following, retrieving, sudden appearance and noise. These subtests were selected for evaluating the following: “a response to a person” (by the social attraction and following subtests), “a response to object and noise” (by the retrieving, sudden appearance and noise subtests). For this simplified test version, we implemented a new scoring protocol that allowed for statistical analysis. We investigated the extent to which different observers describe the same individual the same way (inter-rater reliability) of the test and whether breed size, sex, litter size, maternal parity and care levels as well as environmental differences affects the behavior of seven-week-old puppies in these subtests.

Material and methods

Ethics statement

Special permission for use of animals (dogs) in this kind of behavioural study is not required in Italy. All procedures were performed in full accordance with Italian legal regulations (National Directive n. 26/14—Directive 2010/63/UE) and the guidelines for the treatment of animals in behavioral research and teaching of the Association for the Study of Animal Behavior (ASAB). A written consent to video-record and use data in an anonymous form was obtained by the breeders prior to testing.

Subjects

A total of 105 puppies (52% males and 48% females) from 21 litters belonging to 13 breeds were included in this study. Breeds were classified into 4 groups according to the expected mean adult body weight: small (10 kg and less), medium (between 11 to 25 kg), large (26 to 45 kg) and giant (over 45 kg) breeds (Table 1). The litter sizes varied from 1 to 13 (average 5.23 ± 2.88 Standard Deviation).

Table 1. Puppies used in the study: Size, breed, number of litters and gender (female and male).

Size Breed Number of subjects Litters Female Male
Small French Bulldogue 9 2 3 6
Small Cavalier King Charles Spaniel 5 2 2 3
Small Pomeranian 4 2 1 3
Medium Chow Chow 4 1 2 2
Medium Bull Terrier 7 2 2 5
Medium Belgian Sheepdog (Groenendael) 3 1 2 1
Large American Akita 13 2 7 6
Large Cane Corsos 5 1 2 3
Large Mannara’s Dog 10 2 5 5
Large Labrador Retriever 9 2 5 4
Giant Caucasian Dog 7 1 3 4
Giant Saint Bernard 23 2 12 11
Giant Bernese Mountain Dog 6 1 4 2

All puppies were tested at the breeders at the beginning of seven weeks of age (range 49–52 days). Information about maternal experience (36% primiparous or 64% multiparous) was collected directly by asking the breeders. Presence of adequate maternal behaviour was collected by asking the breeders if they observed “mother-pup interaction during feeding sessions, licking, contact, play, movement towards and away from the puppy” and their response, yes or not, was classified respectively as adequate (90%) or inadequate (10%) maternal behaviour, while information about environment was evaluated by direct inspection. Environment was classified as “stimulating” (42% of total observations) when kennels were located in the breeder’s house, where puppies and their mother were exposed to all the stimuli of a typical household. In contrast, environment was classified as “not stimulating” (58% of total observations) when kennels were located outside of the breeder’s house with limited human contact.

Puppies were tested individually away from conspecifics and were tested prior to their normal feeding time in the late afternoon between 4.00 and 6.00 pm. Testing took place between December 2017 and February 2019. Involved subjects were not housed for use in further research.

Procedure

Tests were carried out in an environment unfamiliar to the puppies at the breeders’ homes. All tests were conducted by the same examiner, who was unfamiliar to the puppies prior to the test. A second person filmed the test for subsequent video analysis. The reaction of the puppy was video recorded during five subtests. A description of each subtest along with the scoring protocol is in Table 2. Each puppy received the 5 subtests in the same order and each test lasted about five minutes per puppy.

Table 2. Behavioral response and scoring protocol for the 5 subtests.

All puppies were individually recorded under each test in a novel space. Video recordings of their behavioral responses in each subtest were independently scored by three assessors using a scale of 60/75/100 to 300.

Subtest Response Score
Social attraction Puppy doesn’t come at all or moves away in another direction 60
Puppy starts to come but changes direction or stops 120
Puppy comes after having gone to another direction 180
Puppy goes to the examiner hesitantly 240
Puppy goes to the examiner immediately 300
Following Puppy stays on the same place / moves away in another direction 100
Puppy follows the examiner hesitantly/follow immediately the examiner, tail straight up and nibbles them 200
Puppy follows the examiner readily 300
Retrieving Puppy doesn’t chase object 60
Puppy starts to chase object, but it loose interest 120
Puppy chases object, picks it up and runs away 180
Puppy chases object and returns without it to the tester 240
Puppy chases object, picks it up and returns with it to tester 300
Sudden Appearance Puppy runs away/Puppy looks and runs to umbrella, mouthing or biting it 75
Puppy looks at umbrella in an excited way (wagging his tail) but doesn't approach to it 150
Puppy moves to umbrella and attempts to investigate by teeth 225
Puppy moves toward the umbrella and investigates in a excited way (sniffing and wagging his tail) 300
Noise Puppy ignores the sound/Cringes, backs off 100
Puppy listens and detects sound but doesn’t move to the source 200
Puppy listens, detects sound and moves to the sound source 300

During the testing phase, puppies were evaluated on how they responded to a person under the social attraction subtest and the following subtest (Fig 1). Puppies were also evaluated on their response to a novel object and sudden noise in the retrieving subtest, sudden appearence subtest and noise subtest (Fig 2). The behaviour of puppies for each subtest was scored on a scale of 60 to 300 where a higher score represented a better response and/or behavior. As the maximum score was 300, the score assigned to the behavioral response for each test was obtained by dividing 300 by 5, 4 or 3 depending of the number of observed reactions. Three assessors independently examined each video and scored each puppy’s behavioral response in each subtest using the scoring protocol described in Table 2.

Fig 1. Social attraction and following subtests performed respectively by an American Akita and a Mannara’s dog puppy.

Fig 1

Fig 2. Retrieving, sudden appearance and noise subtests performed respectively by a Labrador, a Saint Bernard and a Cavalier King Charles Spaniel puppy.

Fig 2

Social attraction subtest

Puppy is placed in the test area. Examiner kneels down and coaxes the puppy to come to them with encouragement and gently clapping hands.

Following subtest

Examiner stands up and slowly walks away encouraging the puppy to follow.

Retrieving subtest

The examiner crouched next to the puppy and attracts its attention with a crumpled piece of paper. When the puppy shows interest, the tester rolls the paper a small distance from the puppy, encouraging it to pick up the paper.

Sudden appearance subtest

The examiner calls the puppy and, when it reaches a distance of 1 m, opens an umbrella and drops it immediately on the ground.

Noise subtest

The puppy is placed in the center of the testing area and the examiner, stationed at the perimeter, makes a sharp noise by banging a spoon on a pan.

Statistical analysis

The results were analyzed to identify inter-rater reliability and the main factors associated to higher scores. Influence of observer was evaluated by variability of assigned scores in response to subtests. Fleiss’ Kappa (K) and Kendall’s coefficient (Kendall’s W) were calculated for assessing the reliability of agreement between raters [12, 13]. Mean scores were analyzed by the proportional odds model for ordinal logistic regression where single variables for each subtest (sex, litter size, maternal parity and good maternal care, environment) were specified in an additive model. The proportional odds model is a class of generalized linear models used for modeling the dependance of an ordinal response on discrete or continuous covariates. Kendall’s Tau-b was used to correlate size with litter size. Since a high correlation, b = 0.63 P<0.001, was found, we chose to use litter size as a covariate. Scores were evaluated for each single subtest and for two main types of responses (response to a person and response to object and noise). For these types of responses, mean scores were calculated from the mean of each subtest, i.e. mean scores obtained by 3 observers for the social attraction test + mean scores for the following test were used for “response to a person”. Scores were classified as low (mean 100), medium (mean 200) and high (mean 300). Due to the small sample size P values <0.10 were considered significant. R (3.3.2) statistical programs were used for all the analysis.

Results

Reliability between observers

Standard deviation between observers was lower than distance between categories. Table 3 shows K and Kendall’s W (from +1: complete accordance to -1: complete disaccordance) for each stimulus. Lower values were found for the sudden appearance subtest while higher values were found for the retrieving test and noise subtest.

Table 3. Standard deviation of scores between observers and K Fleiss and Kendall’s W values.

Test (categories distance) SD K W
Social attraction (60) 24.92 0.55 0.86
Following (100) 26.5 0.57 0.83
Retrieving (60) 7.68 0.70 0.89
Sudden appearence (75) 20.71 0.44 0.76
Noise (100) 18.18 0.67 0.84

Proportional-odds cumulative logit model

Table 4 shows results from the proportional-odds cumulative logit model for each subtest. Variance due to the litter was very low in all tasks except the sudden appearance and noise subtests. Litter size significantly influenced the social attraction (P = 0.03), following (P = 0.06) and sudden appearance (P = 0.06) subtests. Scores of each single subtest were added for each aggregate indicator (“response to a person” and “a response to object and noise”) and mean scores were used as follows for evaluation: 300 (high), 200 (medium) and 100 (low). Table 5 shows results from the proportional-odds cumulative logit model for each indicator. “Response to a person” was significantly influenced by litter size (P = 0.02) and mother parity (P = 0.06). In “Response to object and noise” indicator the variance of litter was high. As shown in Fig 3, for the response to a person indicator, puppies from small litters, as well as puppies from multiparous mother, showed a tendency to have higher scores than others.

Table 4. Estimate of effects on proportional odds cumulative logit scale, SE, P and 90% Confidence Intervals for each subtest.

(α = 0.10); β log odds.
Social attraction Variance in the logit of the scores due to the litter = 1.2e-08 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) -0.37 1.64 0.82 [-3.06, 2.33]
Litter size 0.17 0.08 0.03 [0.04,0.29]
Suitable Environment 0.03 0.43 0.95 [-0.68, 0.74]
Sex (Male) 0.06 0.37 0.88 [-0.55, 0.66]
Mother (adequate maternal behaviour) -0.83 0.81 0.30 [-2.16, 0.50]
Multiparous mother 0.48 0.38 0.20 [-1.09, 0.14]
Following Variance in the logit of the scores due to the litter = 2e-08 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) 0.35 1.69 0.73 [-2.43, 3.13]
Litter size 0.14 0.08 0.06 [0.01, 0.26]
Suitable Environment 0.10 0.44 0.83 [-0.63, 0.82]
Sex (Male) 0.23 0.38 0.55 [-0.40, 0.86]
Mother (adequate maternal behaviour) -0.77 0.85 0.36 [-2.16, 0.62]
Multiparous mother -0.46 0.40 0.26 [-1.12, 0.21]
Retrieving Variance in the logit of the scores due to the litter = 1.2e-07 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) -1.20 1.78 0.48 [-4.13, 1.74]
Litter size 0.07 0.07 0.34 [-0.05, 0.19]
Suitable Environment -0.37 0.44 0.40 [-1.09, 0.35]
Sex (Male) -0.21 0.38 0.58 [-0.84, 0.42]
Mother (adequate maternal behaviour) 0.35 0.87 0.69 [-1.09, 1.78]
Multiparous mother 0.21 0.39 0.59 [-0.43, 0.86]
Sudden appearence Variance in the logit of the scores due to the litter = 0.59 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) 3.30 2.00 0.10 [-0.02, 6.56]
Litter size -0.22 0.12 0.06 [-0.41,–0.03]
Suitable Environment -0.75 0.61 0.22 [-1.76, 0.26]
Sex (Male) -0.17 0.40 0.67 [-0.83, 0.48]
Mother (adequate maternal behaviour) -1.22 0.96 0.20 [-2.80, 0.36]
Multiparous mother -0.04 0.56 0.94 [-0.97, 0.89]
Noise Variance in the logit of the scores due to the litter = 0.48 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) 1.4 2.0 0.48 [-1.86, 4.59]
Litter size 0.10 0.11 0.3 [-0.08, 0.29]
Suitable Environment -0.85 0.63 0.2 [-1.88, 0.18]
Sex (Male) -0.26 0.41 0.5 [-0.93,- 0.41]
Mother (adequate maternal behaviour) -0.94 0.97 0.3 [-2.54, 0.65]
Multiparous mother 0.07 0.57 0.9 [-0.86, 1.00]

Table 5. Estimate of effects on proportional odds cumulative logit scale, SE, P and 90% Confidence Intervals.

(α = 0.10); β log odds.
Response to a person Variance in the logit of the scores due to the litter = 6.2e-09 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) -0.26 1.28 0.84 [-2.37, 1.84]
Litter size 0.15 0.06 0.02 [0.04, 0.25]
Mother (adequate maternal behaviour) -0.21 0.66 0.75 [-1.29, 0.87]
Multiparous mother -0.73 0.39 0.06 [-1.37, 0.10]
Response to object and noise Variance in the logit of the scores due to the litter = 0.23 β SE (β) P-value Confidence Interval (90%)
Intercept (log odds of lower scores) 1.41 1.81 0.44 [-1.57, 4.38]
Litter size 0.03 0.09 0.73 [-0.12, 0.19]
Suitable Environment -0.88 0.54 0.10 [-1.75, 0.005]
Sex (Male) 0.07 0.40 0.87 [-0.59, 0.73]
Mother (adequate maternal behaviour) -0.74 0.87 0.39 [-2.17, 0.68]
Multiparous mother 0.19 0.49 0.70 [-0.62, 0.99]

Fig 3. Influence of litter size and mother parity on “response to human” indicator.

Fig 3

Discussion

We investigated inter-rater reliability and influencing factors of a simple puppy test. Though only three observers scored puppy behaviour, the test categories clearly represented puppy behaviour (Kendall’s W very high). Rating of individual tests showed a high degree of concordance except for the sudden appearance subtest. Although this subtest needs some adjustments to its protocol, the low variability and high values of Kendall’s W validates the credibility of this test. The inter-observer variability among the three observers was low. There is little reporting of inter-rater reliability agreement across two or more independent observers [6]. These findings indicate that using this scoring system for the same puppy could be evaluated similarly by different individuals. Inter-rater reliability is a critical assurance that the scoring system is well defined and can be replicated [14]. In order to better standardize the test, adjustments to the scoring protocol of the sudden appearance subtest are recommended. It is recommended that there should be no distinction between the behavioral responses: “Puppy moves to umbrella and attempts to investigate by teeth” and “Puppy moves toward the umbrella and investigates in a excited way (sniffing and wagging his tail)”. This is because for puppies at 7 weeks of age, using their mouths is their main way to interact with everything in their environment. Puppies start to learn bite inhibition while with their littermates. Thus, we would propose to remove the third behavioral response and keep the remaining three behavioral responses for the sudden appearance subtest.

In the model, six covariates are included as predictive factors of high subtest scores: sex, litter size, stimulating environment, parity of bitch and maternal care levels. The results of the regression in Table 5 show that litter size and mother parity are found to be significant predictors of behavioral response. Litter size is a complex factor that is related to genetic and environmental factors. For instance, a puppy that interacts with more conspecifics during the socialization period can develop differently than a puppy with few or no interactions with conspecifics. Previous research on a population of German Shepherds reported that factors such as litter size, sex ratio, growth rate and season of birth can significantly affect behaviour [15]. Litter size also influences the weight and growth of puppies during the first month of age [16]. Weight has an impact on health and on certain behavioral aspects [17], with larger female puppies being more active and explorative than their smaller counterparts when subjected to a behaviour test at 8 weeks of age [16]. Litter size also influences behavioral responses in other altricial species [18, 19]. In a small litter, the possibility of physical contact between the mother and any sibling is greater, which could be related to better performance on behavioral tests.

Levels of canine maternal care have been reported to affect performance on a temperament test that is conducted at 15–18 months of age [20]. A questionnaire-based study asked dog owners to grade the quality of maternal care, specified as spending time with and taking care of the pups. Lower scores, indicating an estimated poor quality of maternal care by the owner, were associated with fearful behaviour in the adult dog [21]. A longer daily duration of maternal care during the first three weeks postpartum was associated with more exploratory behaviour and less signs of stress in eight week old puppies [22] however, there is no data available for younger ages. Furthermore, a recent review of current literature confirms that the behaviour of an adult dog is determined to a large extent by the quality of maternal care, its attachment style to its mother, and the variety of both social and non-social stimuli provided during the early and late socialisation period [23]. Unlike the reports in the literature, our results did not show an effect of adequate maternal behaviour on performance response to subtests. This could be due to the subjectiveness of the breeders asked to judge this behaviour. For future studies, maternal behaviour should also be recorded directly to determine whether it has an impact on responses to these test. More specifically, in order to have a better standardization of observed behaviours, maternal behavior should be recorded 1 day per week continuously every second hour over a 24-hour period during the first 3weeks postpartum as described by Foyer et al. [20]

In this study, puppies raised by experienced mothers had a tendency to perform better in the subtests related to “response to a person”. In a previous study, mother parity influenced the behavior of puppies of different ages in some subtests [15, 16]. Furthermore, any change in maternal behaviour due to previous reproductive experience may affect the behaviour of puppies. Evaluation of the records of German shepherd dogs from the Swedish armed forces demonstrated that puppies from more experienced bitches scored better for confidence and physical engagement, when tested as young adult dogs [15]. The effect of parity should therefore be further explored in future studies.

Our results showed a trend for suitable environment having an impact on response to the object and noise aggregate indicator (P = 0.1). Dogs raised in domestic environments were less likely to develop fear and aggression towards unfamiliar people compared to dogs raised in non-domestic environments [24]. Sufficient exposure to relevant stimuli during the early socialisation period appears to be associated with lower fearfulness and aggression in dogs [25].

Our simplified version of the Volhards test is quick, easy to administer, free from physical discomfort situations and feasible in different environments. We recommend testing puppies at seven weeks of age because this age is right in the middle of the socialization period and right before the fear period (which can start at about 8 weeks of age, but also varies age between breeds and individuals) [5]. In future studies of puppy behavioral tests, we recommend including litter size and parity of mother as predictors of scores in the “response to a person” subtests. We identified variability in our data due to a litter effect in the “response to object and noise” subtests. Litter effect should always be taken into consideration because differences in behavior among individuals may arise from a common litter environment or from hereditary factors [8]. Due to the small sample size and large heterogeneity of the factors these results have to be considered as preliminary findings. In our study, all breeds, even with size differences, were treated as a single breed because our data were not adequately balanced to include breed as a factor.

Conclusion

Our results support the hypothesis that specific context influences the performance of seven-week-old puppies on a behavioural test. This study shows that there is sufficient inter-rater reliability in the test. The designed scoring system for the test can be used reliably different people and for quantitative analysis. Further work is needed to determine if performances difference factors remain consistent when dogs are retested at a later stage.

Supporting information

S1 Data

(XLSX)

S1 Fig

(PNG)

Acknowledgments

We wish to thank the breeders for their cooperation in conducting this research. We would like to thank Dr. Winnie Y. Chan for her advice and help in English correction and Dr. Alessandra Statelli for her assistance with manuscript preparation. We also thank the anonymous reviewers for their comments and suggestions to improve the quality of this article.

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Riemer S, Müller C, Virányi Z, Huber L, Range F. The predictive value of early behavioural assessments in pet dogs—a longitudinal study from neonates to adults. PLoS ONE, 2014; 9, p e101237, 10.1371/journal.pone.0101237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Barnard S, Marshall-Pescini S, Passalacqua C, Beghelli V, Capra A, Normando S, et al. Does subjective rating reflect behavioural coding? Personality in 2 month-old dog puppies: an open-field test and adjective-based questionnaire. PLoS ONE, 2016; 11(3):e0149831 10.1371/journal.pone.0149831 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Scott JP. The process of primary socialization in canine and human infants. Monogr Soc Res Child Dev 1963; 28 (1): 1–45 [PubMed] [Google Scholar]
  • 4.Miklósi A. The Dog: A Natural History. Princeton University Press; ISBN 978-0-691-17693-2. 2018. [Google Scholar]
  • 5.Luescher A.U. Canine behavior and development In Canine and Feline Behavior for Veterinary Technicians and Nurses (eds Shaw J.K. and Martin D.) 2017. [Google Scholar]
  • 6.Brady K, Cracknell N, Zulch H, Mills D.S. A systematic review of the reliability and validity of behavioural tests used to assess behavioural characteristics important in working dogs. Front vet sci, 2018, 5:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Asher L, Blythe S, Roberts R, Toothill L, Craigon PJ, Evans KM, et al. A standardized behavior test for potential guide dog puppies: Methods and association with subsequent success in guide dog training. J Vet Behav Clin Appl Res, 2013; 8: 431–438. [Google Scholar]
  • 8.Majecka K, Pąsiek M, Pietraszewski D, Smith C. Behavioural outcomes of housing for domestic dog puppies (Canis lupus familiaris). Appl Anim Behav Sci, 2020; 222: 104899. [Google Scholar]
  • 9.Pérez-Guisado J, Munoz-Serrano A, Lopez-Rodriguez R. Evaluation of the Campbell test and the influence of age, sex, breed, and coat color on puppy behavioral responses. Can J Vet Res 2008; 72: 269–277. [PMC free article] [PubMed] [Google Scholar]
  • 10.Lindsay SR. Handbook of applied dog behavior and training Etiology and Assessment of Behavior Problems, Vol. 2, Iowa State University Press, Iowa: 2001. [Google Scholar]
  • 11.Goleman M. Impact of sex, age and raising place on puppies’ aptitude test results. Rocz. Nauk. Pol. Tow. Zootech., 6 2010, pp. 37–43. [Google Scholar]
  • 12.Altman DG. Practical statistics for medical research. New York: Chapman & Hall/CRC Press; 1999. [Google Scholar]
  • 13.Legendre P. Coefficient of Concordance Encyclopedia of Research Design. Salkind NJ ed. SAGE Publications Inc; 2010, Vol 1, 164–169. [Google Scholar]
  • 14.Sherman BL, Gruen ME, Case BC, Foster ML, Fish RE, Lazarowski L, et al. A test for the evaluation of emotional reactivity in Labrador retrievers used for explosives detection. J Vet Behav: Clin Appl Res, 2015; 10: 94–102. [Google Scholar]
  • 15.Foyer P, Wilsson E, Wright D, Jensen P. Early experiences modulate stress coping in a population of German shepherd dogs. Appl Anim Behav Sci, 2013; 146: 79–87. [Google Scholar]
  • 16.Wilsson E, Sundgren PE. Effects of weight and litter size and parity of mother on the behaviour of the puppy and the adult dog. Appl Anim Behav Sci, 1998; 56: 245–254. [Google Scholar]
  • 17.Schrank M, Mollo A, Contiero B, Romagnoli S. Bodyweight at Birth and Growth Rate during the Neonatal Period in Three Canine Breeds. Animals (Basel). 2020;10(1):8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Seitz PF. The effects of infantile experiences upon adult behavior in animal subjects. I. Effects of litter size during infancy upon adult behavior in the rat. Am J Psychiatry 1954; 110: 916–927 10.1176/ajp.110.12.916 [DOI] [PubMed] [Google Scholar]
  • 19.D'Eath RB, Lawrence B. Early life predictors of the development of aggressive behaviour in the domestic pig. Anim Behav, 2004; 67: 501–509. [Google Scholar]
  • 20.Foyer P, Wilsson E, Jensen P. Levels of maternal care in dogs affect adult offspring tempera ment. Sci Rep, 2016; 6: 19253 10.1038/srep19253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tiira K, Lohi H. Early life experiences and exercise associate with canine anxieties. PLoS ONE, 2015. 10:e0141907 10.1371/journal.pone.0141907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Guardini G, Mariti C, Bowen J, Fatjó J, Ruzzante S, Martorell A, et al. Influence of morning maternal care on the behavioural responses of 8-week-old Beagle puppies to new environmental and social stimuli. Appl Anim Behav Sci, 2016; 181: 137–144. [Google Scholar]
  • 23.Dietz L, Arnold AK, Goerlich-Jansson VC, Vinke CM. The importance of early life experiences for the development of behavioural disorders in domestic dogs. Behaviour, 2018; 155: 83‐ 114 [Google Scholar]
  • 24.Appleby DL, Bradshaw JWS, Casey R. Relationship between aggressive and avoidance behaviour by dogs and their experience in the first six months of life. Vet Rec, 2002; 150: 434–438 10.1136/vr.150.14.434 [DOI] [PubMed] [Google Scholar]
  • 25.Aj Pullen, Merrill RJN, Bradshaw JWS. The effect of familiarity on behavior of kennel housed dogs during interactions with humans. Appl Anim Behav Sci, 2012;137: 66–73. [Google Scholar]

Decision Letter 0

Simon Clegg

27 Mar 2020

PONE-D-20-02339

Development of a simple standardized test to evaluate influencing factors on the behavior of seven week puppies (Canis familiaris): a preliminary study

PLOS ONE

Dear Dr Alberghina

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Many thanks for submitting your manuscript to PLOS One

Your manuscript was reviewed by two experts in the field. they have both provided a large number of comments and concerns about the manuscript which need to be addressed prior to acceptance

As both reviewers see value in the work, I have given you the opportunity to work on the comments suggested and to resubmit it for re-review

Please ensure that you write a detailed response to reviewers, covering each of their points raised

I wish you the best of luck with your revisions

Hope you are keeping safe in these difficult times

Thanks

Simon

==============================

We would appreciate receiving your revised manuscript by May 11 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Simon Russell Clegg, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please provide additional details regarding the source of the dogs used in your study and ensure you have described where the dogs were obtained from.

3. We noticed you have some minor occurrence of overlapping text with the following previous publications, which needs to be addressed:

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101237

https://brill.com/view/journals/beh/155/2-3/article-p83_1.xml?language=en

In your revision ensure you cite all your sources, and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I am afraid this manuscript needs quite a bit of extra work before I would consider it to be publishable. The methods are not described in enough detail, I am not convinced that the statistical analysis is appropriate as (no control for dogs being of the same litter), the results are not fully reported (only ‘significant’ statistics are given) and the findings are overstated. I’ve provided specific comments below:

The abstract describes part of the aim as developing a test, but this isn’t what the authors did; they used an existing test and looked at factors that were associated with different scores on the test plus inter-observer reliability. Please make it clearer what the study actually did.

Line 33: the effect sizes for breed-group were small, you don’t have grounds to claim ‘considerable influence’ on the test performance. There was some association at best.

Line 47: “may facilitate certain testing since puppies are motivated to approach unknown people” I’m not sure what point you are making here. Do you mean that because the fear response hasn’t fully developed yet that they can be more easily handled by strangers to conduct the test?

Line 49: remove ‘stage of’ so it reads “at this age”

Line 54-56: I have problems with this whole section that it overstates the impact of genetics on behaviour. What we know from research is that dogs are individuals, and breed has little to do with general personality – hence the large amount of within-breed variance. Only around 30% of personality is heritable, of which a large part is probably due to shared early environment (including the womb environment). This bit in particular needs to be written with more caution. Only some of what determines a dog’s personality is affected by its genetics, and very little by its breed. There is a lot of misunderstanding rife in the public about how much breed influences dog personality and as dog scientists we must be very careful we do not fuel this further. When testing puppies at such a young age, estimates of heritability are stronger as variability is reduced, but this is to be expected since the dogs have all shared the same life experience so far. When heritability of behaviour (and variance) is examined in older dogs, once environment has had a chance to influence them, it drops dramatically showing that early estimates are over-estimates. This is an inherent problem with drawing conclusions about genetic influence from variability shown under limited, standardised conditions.

Line 89: what information did you collect about maternal care levels and how?

Line 91 & 93: ‘stimulating’ and ‘non-stimulating’ both are 42%, so what were the remaining 16%?

Line 112: this makes no sense “Scoring 300 the most suitable reaction, interval between behavior options for each subtest were obtained dividing by 5, 4 or 3 depending of the number of observed reactions”. Please re-phrase this.

Statistical analysis: please use inter-observer reliability, instead of ‘variability’ as the two terms mean different things, and reliability is the most commonly used/understood. This also relates to your Abstract, where you state inter-observer variability as being low, but it would be much easier to understand if you said inter-observer reliability was high. Please also state here what cut-offs you were prepared to consider acceptable in the K and W statistics: these decisions should always be made before you run your analysis. How were the scores distributed? Did you check for normality first? Since you’re working with dogs from litters, the individual dogs are not independent of each other so ‘litter’ should be taken into consideration as a random effect in any analysis. As far as I can tell you did not control for litter in this analysis, which means you have not adjusted for pseudo-replication that using dogs of the same litter would cause. How were the ‘complex indicators’ calculated? Please describe this. Considering the large number of comparisons made here I would also expect to see P-value correction to account for multiple testing.

Table 3 & 4 should be merged into one. You should also provide the confidence interval for the K and W coefficients.

Table 5 must report all values – not just ‘significant’ ones. It’s also important to include the confidence interval around the Beta estimate, as p-values alone are essentially meaningless.

Line 165: The W values were high, but not ‘very high’ (>0.90 would be very high).

Line 170: typo – ‘avaluated’

Line 175: typo – ‘indipendently’

Line 180: repeated word “in test social attraction test”

Line 203: What does this mean: “positive influence on “learning predisposition” indicator in pups”?

You say in the abstract that “In order to better standardize the test, adjustments to the scoring protocol are recommended yet I can’t find any such recommendations anywhere.

There’s one sentence on the study limitations, which isn’t really good enough.

Reviewer #2: The current manuscript examines responses of 7-week old puppies to a series of behavioural tests, examining both interobserver reliability for scoring for the tests, and the effects of various puppy-related and environment-related variables on their responses. Examining early factors and their influence on puppy and later adult behaviour is an important topic. However, I have some questions about the overall objective of the current work as well as the approach that was taken. I have summarized these concerns below, and provided specific relevant comments below in the ‘general comments’ section

1. Based on the information provided in the introduction, the rationale for this particular study is unclear and needs strengthening. Why do we need to understand how different factors influence the behaviour of young puppies when previous studies have shown that performance on tasks at this age has poor predictive power for later behaviour? I recommend that the authors reframe the introduction to clearly outline what research on puppy testing has already been published and why this particular study is necessary and important – I’m assuming it is to understand sources of variability so that test performance can be improved, but this isn’t clear from what is written.

2. It’s unclear why the authors selected to use these particular tests for testing the puppies, so the justification needs to be strengthened. Why are these areas of assessment important, and why did they select this particular test when it hasn’t been used in any previously published work? Also, the methods that were used for testing are not described in sufficient detail for proper assessment of rigour and for replication, and some of the descriptions for scoring in Table 2 are quite vague. Were these the final descriptions that were used for the observers during scoring?

3. The methods and results for the regression analysis are a bit confusing and this makes it difficult to properly assess the findings and conclusions for this study. Areas where more information is needed are described in more detail below. However, based on the information provided it appears that the analysis doesn’t account for clustering, which is critical when examining litters of puppies.

General Comments

Data statement: The authors have indicated that all data are present in the manuscript, but only aggregate data is presented. The raw data is not available for review.

The manuscript requires further editing for grammar and spelling, particularly the discussion.

L16-19: The aims for the current study are unclear. If 7 weeks of age is too young to predict later temperament, why is it important to understand factors that influence behaviour at this age? Clarification is needed

L20: replace stimuli with tests or tasks

L21-22: I wonder if the suitability of the response depends on the purpose for which the puppy has been bred? Is there a way to word this that is descriptive rather than suggesting that a particular response is best?

Complex indicators: What is the rationale for combining particular tasks together into complex indicators. Is there some indication that responses on these tasks are related?

L27-30: The results for these complex indicators are not presented in the main body of the manuscript.

Introduction: The intro is a bit confusing because the authors start out discussing the potential for early testing to predict later potential, but this does not actually relate to their objective of looking at factors that influence the behaviour of puppies during early behaviour testing. The rationale for this particular study needs to be more explained more clearly.

L44-46: The following sentences aren't really necessary for the introduction: "Analyses of behaviour normally involves measuring frequency, duration and latency of specific behaviours [3]. Rater-coding is normally done on a predetermined scale that may have only 3, 5, or 7 points [4]."

L47: Can you clarify why increased motivation to approach might be useful for these assessments?

L50-53: Given the objective of this study, it is important to summarize the literature on puppy assessments that have been published to date. What have people tried, what did they find and why is it inadequate and requiring further study? There is quite a large literature on this topic (both general and applied to selection for certain programs) and few studies are cited above but very little detail is provided to develop the rationale for the current study.

L60-61: Please clarify that these tests are used in practice for assessment, and that these aren't methods that have been used and assessed in the literature. What is your rationale for using these tests rather than other tests that have been used and assessed previously in the peer-reviewed literature?

L63: Missing space in "outthe"

L63-65: Is it necessary to provide this information when the authors chose not to use these categories? Instead please provide further justification for the approach that you chose to go with (ie, genetic relatedness). If you do decide to keep this information in the paper, please state clearly why this approach is inappropriate for your purposes.

L69-70: The objectives for the current study could use some clarification. What was your rationale for testing these specific factors, and did you have any predictions about how these factors would affect puppy behaviour beforehand?

L74: Add an 's' to ethics

L85: Add an 's' to breeds

L89-90: How, specifically, was the level of maternal care evaluated in the current study?

L91-93: This only accounts for 84% of observations - how were the others classified?

L95-96: Were the puppies tested at a particular time of day?

Table 1: Please expand this table to list the specific litters and number of puppies per litter to provide a better representation of the sample.

L110-113: I'm having trouble understanding the methods described here - please clarify by providing further details

Table 2: The rows on the table don't line up properly so it is difficult to determine which subtests the responses align with.

Table 2: Fix spelling for Appearance

Table 2: Since this isn't a series of tests that has been previously published in the literature, please describe the methods in sufficient detail that they can be replicated.

Table 2 - Sudden Appearance Test: What does curiosity refer to specifically? And for the final category, what constituted investigation?

Table 2 - Noise test: Again, what specific behavioural responses was indicative of curiosity?

L126-127: Please provide basic information and a reference for interpretation of these values. There are standard cutoffs for interpretation that are commonly used in the literature.

L127-129: Further details are needed on the methods used for statistical modelling. Does mean score refer to the mean of all three observers, and/or were the scores converted in some manner to categories prior to ordinal logistic regression? Also, what methods, if any, were used to account for clustering with litter (e.g., was litter included as a random effect)? We can expect that litter effects are likely quite large, so realistically your sample size is reduced to the number of litters unless you can account for clustering in some way.

Table 3: Why were standard deviations calculated? This seems unnecessary when Kappa and Kendall's are presented.

L150-152: Please move this information to the M&M. Does this mean that scores were re-categorized as 0-100, 101-200, 201-300 prior to analysis? What was the rationale for this decision?

L153-157: The figures alone do not provide sufficient information for evaluating the results from these models. Please provide the relevant values to support the comparisons that are made within the text or in Table 5.

Table 5: What is your justification for using a significance level of 0.1? Info should be provided in M&M

Table 5: I'm struggling a bit to interpret the information provided for differences amongst breed groups. For example, what was the overall value for genetic group and which referent are these in comparison to? Or was each group looked at separately in comparison to others not in that group? Apologies if I'm misinterpreting how this was done but it isn't clear to me.

Table 5: Sudden appearance test – typo

L164: The agreement between observers suggests reasonable agreement for scoring, but I'm not sure how this relates to clear representations of puppy behaviour. Please explain

L165: What are your interpretations of these statistics based on - please provide references.

L170: While the Kendall's W values are good, the Kappa values are in the moderate agreement range. It might be worth discussing why both were used, and differences in interpretation between the two.

L189: Can you be more specific about what aspects of behaviour were affected in this previous study?

Discussion: The discussion might benefit from additional consideration of how the current puppy tests are similar/dissimilar from previous research in this area.

L215-216: The final sentence of the conclusion is not supported by the data presented.

L268: This reference is not available at the link provided.

Figure 6: Should the blue label read Unsuitable or Unstimulating?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 29;15(7):e0236271. doi: 10.1371/journal.pone.0236271.r002

Author response to Decision Letter 0


13 May 2020

We agree with all reviewer comments and we have revised our manuscript accordingly. We are including all reviewers’ suggestions and clarifying the text when needed. We are confident that the new version of the manuscript will be greatly improved.

Reviewers' comments

Reviewer #1: I am afraid this manuscript needs quite a bit of extra work before I would consider it to be publishable. The methods are not described in enough detail, I am not convinced that the statistical analysis is appropriate as (no control for dogs being of the same litter), the results are not fully reported (only ‘significant’ statistics are given) and the findings are overstated. I’ve provided specific comments below:

The abstract describes part of the aim as developing a test, but this isn’t what the authors did; they used an existing test and looked at factors that were associated with different scores on the test plus inter-observer reliability. Please make it clearer what the study actually did.

We have modified the title of the manuscript as follow “Behavior test for seven-week old puppies (Canis familiaris): inter-rater reliability and factors associated with test performance ” and we have modified the abstract. We reanalysed our data and present the new results in the abstract and in text.

Line 33: the effect sizes for breed-group were small, you don’t have grounds to claim ‘considerable influence’ on the test performance. There was some association at best.

We have decided to exclude the breed effect because as suggested by the reviewer, breed itself has little to do with general personality. Moreover, our sample was too small to evaluate differences between breeds. Instead, we reclassified dog breeds according to size as there is evidence for size-related differences in personality and reanalyzed our data. Using Kendall’s Tau-b a high correlation, 0.63, was found between breed size and litter size.

Line 47: “may facilitate certain testing since puppies are motivated to approach unknown people” I’m not sure what point you are making here. Do you mean that because the fear response hasn’t fully developed yet that they can be more easily handled by strangers to conduct the test?

We have modified this sentence

Line 49: remove ‘stage of’ so it reads “at this age”

Amended

Line 54-56: I have problems with this whole section that it overstates the impact of genetics on behaviour. What we know from research is that dogs are individuals, and breed has little to do with general personality – hence the large amount of within-breed variance. Only around 30% of personality is heritable, of which a large part is probably due to shared early environment (including the womb environment). This bit in particular needs to be written with more caution. Only some of what determines a dog’s personality is affected by its genetics, and very little by its breed. There is a lot of misunderstanding rife in the public about how much breed influences dog personality and as dog scientists we must be very careful we do not fuel this further. When testing puppies at such a young age, estimates of heritability are stronger as variability is reduced, but this is to be expected since the dogs have all shared the same life experience so far. When heritability of behaviour (and variance) is examined in older dogs, once environment has had a chance to influence them, it drops dramatically showing that early estimates are over-estimates. This is an inherent problem with drawing conclusions about genetic influence from variability shown under limited, standardised conditions.

No genetic influence was considered in this new version of the manuscript

Line 89: what information did you collect about maternal care levels and how?

Maternal care levels were collected by asking the owners about licking, nursing, contact, play, movement towards and away from the puppy...

Line 91 & 93: ‘stimulating’ and ‘non-stimulating’ both are 42%, so what were the remaining 16%?

We have corrected the wrong percentage reported in the text!

Line 112: this makes no sense “Scoring 300 the most suitable reaction, interval between behaviour options for each subtest were obtained dividing by 5, 4 or 3 depending of the number of observed reactions”. Please re-phrase this.

We re-phrased this as follows:

The most suitable reaction was scored 300, and the interval between behaviour options for each subtest were obtained by dividing by 5, 4 or 3 depending of the number of subtest options.

Statistical analysis: please use inter-observer reliability, instead of ‘variability’ as the two terms mean different things, and reliability is the most commonly used/understood. This also relates to your Abstract, where you state inter-observer variability as being low, but it would be much easier to understand if you said inter-observer reliability was high.

We replaced the term variability in the abstract and in the text

Please also state here what cut-offs you were prepared to consider acceptable in the K and W statistics: these decisions should always be made before you run your analysis. How were the scores distributed? Did you check for normality first?

We check for normality but data were not normal. K and W are non-parametric tests, then normality of scores is not required (neither for cumulative logit normality is required). K and W values were defined prior to running the statistical analysis and are defined based on the values reported in the literature (references have been included in the text).

Since you’re working with dogs from litters, the individual dogs are not independent of each other so ‘litter’ should be taken into consideration as a random effect in any analysis. As far as I can tell you did not control for litter in this analysis, which means you have not adjusted for pseudo-replication that using dogs of the same litter would cause.

We reanalyzed our data with “litter” as a random effect and results were adjusted for pseudo-replication, we are grateful for suggestion.

How were the ‘complex indicators’ calculated? Please describe this.

Description was added in the text:

Complex indicators were calculated from mean of each subtest as follows: mean scores obtained by observers for social attraction + following test for complex indicator “dog-human interaction”. Mean scores obtained by observers for the sum of other subtests for complex indicator “learning predisposition”. Scores were considered low (mean 100), medium (mean 200) and high (mean 300).

Considering the large number of comparisons made here I would also expect to see P-value correction to account for multiple testing.

As suggested, we have addressed the problem and report the corrected p-value across the models. Thus obtaining a new threshold for significance of 0.0143 (0.10/7).

Table 3 & 4 should be merged into one. You should also provide the confidence interval for the K and W coefficients.

It is not possible provide confidence interval for K and W coefficients since normality can not be assumed.

Table 5 must report all values – not just ‘significant’ ones. It’s also important to include the confidence interval around the Beta estimate, as p-values alone are essentially meaningless.

We have modified the table and reported all values. Reporting Beta and its standard error is enough to quantify the effect size and its variability. P-values and confidence intervals are both functions of Beta and its standard error, thus they held both the same value in terms of inference.

Line 165: The W values were high, but not ‘very high’ (>0.90 would be very high).

As reported in literature very high values are between 0.81 and 1

Line 170: typo – ‘avaluated’

Amended

Line 175: typo – ‘indipendently’

Amended

Line 180: repeated word “in test social attraction test”

We have removed the sentence

Line 203: What does this mean: “positive influence on “learning predisposition” indicator in pups”?

We have removed the sentence since results were modified

You say in the abstract that “In order to better standardize the test, adjustments to the scoring protocol are recommended yet I can’t find any such recommendations anywhere.

There’s one sentence on the study limitations, which isn’t really good enough.

We recommended to adjust scoring protocol and the sentence was modified

Reviewer #2: The current manuscript examines responses of 7-week old puppies to a series of behavioural tests, examining both interobserver reliability for scoring for the tests, and the effects of various puppy-related and environment-related variables on their responses. Examining early factors and their influence on puppy and later adult behaviour is an important topic. However, I have some questions about the overall objective of the current work as well as the approach that was taken. I have summarized these concerns below, and provided specific relevant comments below in the ‘general comments’ section

1. Based on the information provided in the introduction, the rationale for this particular study is unclear and needs strengthening. Why do we need to understand how different factors influence the behaviour of young puppies when previous studies have shown that performance on tasks at this age has poor predictive power for later behaviour? I recommend that the authors reframe the introduction to clearly outline what research on puppy testing has already been published and why this particular study is necessary and important – I’m assuming it is to understand sources of variability so that test performance can be improved, but this isn’t clear from what is written.

The introduction was modified according to reviewer comments.

2. It’s unclear why the authors selected to use these particular tests for testing the puppies, so the justification needs to be strengthened. Why are these areas of assessment important, and why did they select this particular test when it hasn’t been used in any previously published work? Also, the methods that were used for testing are not described in sufficient detail for proper assessment of rigour and for replication, and some of the descriptions for scoring in Table 2 are quite vague. Were these the final descriptions that were used for the observers during scoring?

We have improved the description in Table 2 and we have described in detail the methods. We have also explained the reason for selecting these particular tests.

3. The methods and results for the regression analysis are a bit confusing and this makes it difficult to properly assess the findings and conclusions for this study. Areas where more information is needed are described in more detail below. However, based on the information provided it appears that the analysis doesn’t account for clustering, which is critical when examining litters of puppies.

We reanalysed the data and included “litter” as a random effect and results were adjusted for pseudo-replication.

General Comments

Data statement: The authors have indicated that all data are present in the manuscript, but only aggregate data is presented. The raw data is not available for review.

The raw data is now available for review

The manuscript requires further editing for grammar and spelling, particularly the discussion.

Grammar and spelling were edited

L16-19: The aims for the current study are unclear. If 7 weeks of age is too young to predict later temperament, why is it important to understand factors that influence behaviour at this age? Clarification is needed

We have clarified the aim of the study which is to evaluate inter-observer reliability and identify factors that influence behavior at this age. This information could be useful for behavioural science as well as for canine breeding management

L20: replace stimuli with tests or tasks

Amended

L21-22: I wonder if the suitability of the response depends on the purpose for which the puppy has been bred? Is there a way to word this that is descriptive rather than suggesting that a particular response is best?

This could probably be true but we have deleted the genetic part of the study. The behavioural response could be suitable or not suitable, the test is performed with this aim.

Complex indicators: What is the rationale for combining particular tasks together into complex indicators. Is there some indication that responses on these tasks are related?

In many tests there are different subtests, we have considered some subtests more related to each other than others

L27-30: The results for these complex indicators are not presented in the main body of the manuscript.

We have modified the presentation of these indicators.

Introduction: The intro is a bit confusing because the authors start out discussing the potential for early testing to predict later potential, but this does not actually relate to their objective of looking at factors that influence the behaviour of puppies during early behaviour testing. The rationale for this particular study needs to be more explained more clearly.

We have modified the introduction accordingly

4-46: The following sentences aren't really necessary for the introduction: "Analyses of behaviour normally involves measuring frequency, duration and latency of specific behaviours [3]. Rater-coding is normally done on a predetermined scale that may have only 3, 5, or 7 points [4]."

We have deleted these sentences

L47: Can you clarify why increased motivation to approach might be useful for these assessments?

We have modified the sentence as follows:

The period between 6 and 8 weeks of development may facilitate certain testing since puppies haven’t fully developed the fear response and they can be more easily handled by unknown people

L50-53: Given the objective of this study, it is important to summarize the literature on puppy assessments that have been published to date. What have people tried, what did they find and why is it inadequate and requiring further study? There is quite a large literature on this topic (both general and applied to selection for certain programs) and few studies are cited above but very little detail is provided to develop the rationale for the current study.

We have summarized the literature on puppy assessments in order to develop the rationale for the current study

L60-61: Please clarify that these tests are used in practice for assessment, and that these aren't methods that have been used and assessed in the literature. What is your rationale for using these tests rather than other tests that have been used and assessed previously in the peer-reviewed literature?

We have explained this point more clearly

L63: Missing space in "outthe"

The space was inserted

L63-65: Is it necessary to provide this information when the authors chose not to use these categories? Instead please provide further justification for the approach that you chose to go with (ie, genetic relatedness). If you do decide to keep this information in the paper, please state clearly why this approach is inappropriate for your purposes.

We have decided to not include the approach of genetic relatedness

L69-70: The objectives for the current study could use some clarification. What was your rationale for testing these specific factors, and did you have any predictions about how these factors would affect puppy behaviour before hand?

We added the rationale for testing these specific factors

L74: Add an 's' to ethics

Amended

L85: Add an 's' to breeds

Amended

L89-90: How, specifically, was the level of maternal care evaluated in the current study?

We added: presence of licking, nursing, contact, play...

L91-93: This only accounts for 84% of observations - how were the others classified?

We have corrected the mistake

L95-96: Were the puppies tested at a particular time of day?

They were tested in the late afternoon between 4.00 and 6.00 pm, we added this information

Table 1: Please expand this table to list the specific litters and number of puppies per litter to provide a better representation of the sample.

Table was expanded

L110-113: I'm having trouble understanding the methods described here - please clarify by providing further details

Further details have been provided

Table 2: The rows on the table don't line up properly so it is difficult to determine which subtests the responses align with.

The table was improved

Table 2: Fix spelling for Appearance

Amended

Table 2: Since this isn't a series of tests that has been previously published in the literature, please describe the methods in sufficient detail that they can be replicated.

We have better described the puppy response alternatives

Table 2 - Sudden Appearance Test: What does curiosity refer to specifically? And for the final category, what constituted investigation?

We have modified the sentence as follows:

Puppy moves toward the umbrella and investigates in a excited way (sniffing and wagging his tail) (300 scores)

Table 2 - Noise test: Again, what specific behavioural responses was indicative of curiosity?

We have modified the description: Puppy listens, detects sound and moves to the sound source

L126-127: Please provide basic information and a reference for interpretation of these values. There are standard cutoffs for interpretation that are commonly used in the literature.

Reference for interpretation of these values was added

L127-129: Further details are needed on the methods used for statistical modelling. Does mean score refer to the mean of all three observers, and/or were the scores converted in some manner to categories prior to ordinal logistic regression? Also, what methods, if any, were used to account for clustering with litter (e.g., was litter included as a random effect)? We can expect that litter effects are likely quite large, so realistically your sample size is reduced to the number of litters unless you can account for clustering in some way.

We reanalyzed the data “litter” as a random effect and results were adjusted for pseudo-replication. We first calculated the mean scores assigned by 3 observers for each subtests. Then the means of the subtests were averaged for computing complex indicators and inserted in the closest category (low, medium or high).

Table 3: Why were standard deviations calculated? This seems unnecessary when Kappa and Kendall's are presented.

Standard deviations were calculated to show that scores assigned by each observer were lower than scores between different behavioural responses. This is the reason that values obtained with Kendall's were high.

L150-152: Please move this information to the M&M.

Amended

Does this mean that scores were re-categorized as 0-100, 101-200, 201-300 prior to analysis? What was the rationale for this decision?

Scores were re-categorized as low (100), medium (200) and high (300) since variables are non-quantitative they were re-categorized arbitrarily into these 3 groups

L153-157: The figures alone do not provide sufficient information for evaluating the results from these models. Please provide the relevant values to support the comparisons that are made within the text or in Table 5.

Relevant values were provided to support the comparisons made in the table

Table 5: What is your justification for using a significance level of 0.1? Info should be provided in M&M

We have included this information in M&M. We have selected this level because of the small sample size.

Table 5: I'm struggling a bit to interpret the information provided for differences amongst breed groups. For example, what was the overall value for genetic group and which referent are these in comparison to? Or was each group looked at separately in comparison to others not in that group? Apologies if I'm misinterpreting how this was done but it isn't clear to me.

We have decided to not consider breed groups.

Table 5: Sudden appearance test – typo

Amended

L164: The agreement between observers suggests reasonable agreement for scoring, but I'm not sure how this relates to clear representations of puppy behaviour. Please explain

The agreement between observers implies an observation of a similar behaviour. If test are not performed to discriminate behaviours, observers would be not able to score the correct behaviour in a reliable way.

L165: What are your interpretations of these statistics based on - please provide references.

Reference were provided

L170: While the Kendall's W values are good, the Kappa values are in the moderate agreement range. It might be worth discussing why both were used, and differences in interpretation between the two.

We have modified the discussion according this comment, Kappa is about perfect agreement, either judges assign the same score or they do not agree with each other. W instead also takes into account by how much the scores diverge to express the agreement.

L189: Can you be more specific about what aspects of behaviour were affected in this previous study?

Amended

Discussion: The discussion might benefit from additional consideration of how the current puppy tests are similar/dissimilar from previous research in this area.

We have modified the discussion accordingly.

L215-216: The final sentence of the conclusion is not supported by the data presented.

We have modified the final sentence

L268: This reference is not available at the link provided.

We have replaced reference

Figure 6: Should the blue label read Unsuitable or Unstimulating?

We have deleted the figure since the results have been modified by further statistical analysis

Decision Letter 1

Simon Clegg

17 Jun 2020

PONE-D-20-02339R1

Behavior test for seven-week old puppies ( Canis familiaris ): inter-rater reliability and factors associated with test performance

PLOS ONE

Dear Dr. Alberghina

Thank you for submitting your manuscript to PLOS ONE. A few minor revisions have been suggested by the reviewers. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Many thanks for resubmitting your manuscript to PLOS One

The manuscript has been reviewed and the reviewers have requested some further minor revisions

If you could make the minor revisions, and write a response to reviewers, then your manuscript can be reviewed rapidly upon re-submission

I wish you the best of luck with your revisions

Hope you are keeping safe and well in these difficult times

Thanks

Simon

==============================

Please submit your revised manuscript by Aug 01 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Simon Clegg, PhD

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is the 2nd round of reviewing for this manuscript. The authors have done well to address extensive comments from both reviewers, which have included amending the manuscript title and re-doing their statistical analysis. Much of the text has been re-written or amended so I have had to read it again with fresh eyes. Specific comments for minor improvements are given below.

Abstract:

The age of the dogs should be stated in the abstract (I know it’s in the title, but it still needs to be in the abstract).

Perhaps the term ‘aggregate indicators’ is a more precise description than ‘complex indicators’?

The abstract is much improved, well done to the authors.

The Introduction is much improved and the aims/purpose of the study much clearer.

Methods, some minor comments:

Whilst the experimenter and camera person may have been male, please consider using gender neutral terminology to describe the protocol i.e. ‘A second person filmed the test for subsequent video analysis’ (first paragraph under Procedure), and in Table 2 change ‘him’ to ‘them’ (in the response to Following)

The way maternal care was captured has not been described still. What exactly did breeders score? This needs to be described if anyone were to replicate (or improve) upon the way it was judged.

Whilst I do understand what you mean now when you say this, I think it can still be made clearer for a new reader: ‘The interval between behavioural responses for each test was obtained dividing by 5, 4 or 3 depending of the number of observed reactions.’

Perhaps this wording will help: As the maximum score was 300, the score assigned to the behavioural responses for each test was obtained by dividing 300 by 5, 4 or 3 depending of the number of response categories for the test.

Results:

Here: ‘Litter size significantly influenced the social attraction (P=0.03), following and sudden appearance (P=0.06)’ it would be good to clarify that the p value was the same for Following and Sudden appearance as it looks like the value simply wasn’t stated for Following.

P=0.06 is being stated as ‘significant’ and in the tables there are (slightly hard to find) notes to say alpha = 0.10 but this isn’t stated anywhere in the text with justification for this. I know that this was done due to the small sample size from the replies to reviewers comments, but new readers also need to know this.

Discussion:

Whilst these statements are true, they are just hanging there as their own isolated paragraph and aren’t connected to anything else: “Dogs raised in domestic environments were less likely to develop fear and aggression towards unfamiliar people compared to dogs raised in non-domestic environments [24]. Sufficient exposure to relevant stimuli during the early socialisation period appears to be associated with lower fearfulness and aggression in dogs [25].”

Conclusion: I am not convinced that you can recommend the test for use based upon the results of this study alone as you do here “This simple test can be recommended for all puppies.”. This study shows that there is sufficient inter-rater reliability in the test, and that it captures some variance that is associated with other factors, but what are you recommending it for use for? I think the most you can conclude is that you have successfully designed a score system for the test that can be used reliably by different people and for quantitative analysis.

Reviewer #3: I was invited as a second reviewer, presumably due to the unavailability of the original reviewer (?). I thoroughly enjoyed reading the manuscript, and found it very interesting. It appears that the intense first review comments have mostly been addressed. This is a very interesting paper, and I look forward to seeing it published. Most of my comments are only minor.

This appears to be a major development in the behavioural assessment field, and one which I was privileged to read. I praise the authors on an excellent study, and a good manuscript, and offer my thanks for this study, and offer my best wishes for the future, and for your safety in the coronavirus pandemic.

You have a really nice abstract

Introduction

The introduction is nice, clear and well written. It has a nice flow and clearly states the aims of the study. Perhaps a little bit of extra detail on the modification maybe useful, but I do not feel strongly about that, and I will leave that up to your judgement.

The period between 6 and 7 weeks of development may facilitate certain testing, since puppies haven’t fully developed the fear imprinting response and they can be more easily handled by unknown people [3]. Please add in the comma between testing and since.

Methods

Again, generally well written, but a few minor points

I feel that the methods would be better written in a neutral gender. It almost sounds like the only way it would work is with a male.

You mention the maternal experience and the environment. How was this analysed? Was this by yourselves, or the breeder, and if it is the latter, how can you be sure that it was honest or standardised?

I particularly struggled to follow the following ‘The interval between behavioural responses for each test was obtained dividing by 5, 4 or 3 depending of the number of observed reactions’. Is it possible to reword it please?

Table 1- could you group these by size so it is easier to read?

Results

Some of the tables would be nicer as a proper table rather than as it is – i.e. with lines like Table 1

Variance due to the litter was very low in all tasks, except the sudden appearance and noise subtests. Add in the comma between tasks and except

You have a P value of 0.06 being statistically significant. It is more common to use 0.05. Is there a reason why this value was chosen? I feel that this should be stated somewhere in the text, probably the methods (unless I missed it)

Discussion

Again, I really like the discussion

There are a few random statements which just appear dumped in certain places which could be better integrated elsewhere in the discussion

I think that your conclusion is good, but maybe over egging it somewhat. Perhaps consider toning it down.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 29;15(7):e0236271. doi: 10.1371/journal.pone.0236271.r004

Author response to Decision Letter 1


1 Jul 2020

29/06/2020 Messina,

Dear Editor and Reviewers,

Thank you very much for reviewing our manuscript titled “Behavior test for seven-week old puppies (Canis familiaris): inter-rater reliability and factors associated with test performance”.

We are happy to know that reviewers found the manuscript improved. We are including all useful reviewers’ suggestions and clarifying the text when needed. We are confident that the new version of the manuscript will be greatly improved.

Please, find below the referees’ comments (black font) and our responses (blue font) inserted after each comment.

Looking forward hearing from you soon.

Best regards,

Daniela Alberghina

Reviewers' comments

Abstract:

The age of the dogs should be stated in the abstract (I know it’s in the title, but it still needs to be in the abstract).

Perhaps the term ‘aggregate indicators’ is a more precise description than ‘complex indicators’?

The age of the dogs was stated and we agree with reviewer that the term “aggregate” is more appropriate than “complex”

The abstract is much improved, well done to the authors.

Many thanks

The Introduction is much improved and the aims/purpose of the study much clearer.

Methods, some minor comments:

Whilst the experimenter and camera person may have been male, please consider using gender neutral terminology to describe the protocol i.e. ‘A second person filmed the test for subsequent video analysis’ (first paragraph under Procedure), and in Table 2 change ‘him’ to ‘them’ (in the response to Following)

Amended

The way maternal care was captured has not been described still. What exactly did breeders score? This needs to be described if anyone were to replicate (or improve) upon the way it was judged.

We have asked to the breeder the presence of adequate maternal behaviour (e.g. mother-pup interaction during feeding sessions, licking, contact, play, movement towards and away from the puppy) and their response was classified as adequate or inadequate maternal behaviour

Whilst I do understand what you mean now when you say this, I think it can still be made clearer for a new reader: ‘The interval between behavioural responses for each test was obtained dividing by 5, 4 or 3 depending of the number of observed reactions.’

Perhaps this wording will help: As the maximum score was 300, the score assigned to the behavioural responses for each test was obtained by dividing 300 by 5, 4 or 3 depending of the number of response categories for the test.

Thank you for suggestion!

Results:

Here: ‘Litter size significantly influenced the social attraction (P=0.03), following and sudden appearance (P=0.06)’ it would be good to clarify that the p value was the same for Following and Sudden appearance as it looks like the value simply wasn’t stated for Following.

P=0.06 is being stated as ‘significant’ and in the tables there are (slightly hard to find) notes to say alpha = 0.10 but this isn’t stated anywhere in the text with justification for this. I know that this was done due to the small sample size from the replies to reviewers comments, but new readers also need to know this.

We have added this information in the text: Due to the small sample size P values <0.10 were considered significant.

Discussion:

Whilst these statements are true, they are just hanging there as their own isolated paragraph and aren’t connected to anything else: “Dogs raised in domestic environments were less likely to develop fear and aggression towards unfamiliar people compared to dogs raised in non-domestic environments [24]. Sufficient exposure to relevant stimuli during the early socialisation period appears to be associated with lower fearfulness and aggression in dogs [25].”

We have added an additional sentence to link these two statements to our findings.

Conclusion: I am not convinced that you can recommend the test for use based upon the results of this study alone as you do here “This simple test can be recommended for all puppies.” This study shows that there is sufficient inter-rater reliability in the test, and that it captures some variance that is associated with other factors, but what are you recommending it for use for? I think the most you can conclude is that you have successfully designed a score system for the test that can be used reliably by different people and for quantitative analysis.

Thank you for suggestion! We have modified conclusions accordingly

Reviewer #3: I was invited as a second reviewer, presumably due to the unavailability of the original reviewer (?). I thoroughly enjoyed reading the manuscript, and found it very interesting. It appears that the intense first review comments have mostly been addressed. This is a very interesting paper, and I look forward to seeing it published. Most of my comments are only minor.

Thank you for your kind comments

This appears to be a major development in the behavioural assessment field, and one which I was privileged to read. I praise the authors on an excellent study, and a good manuscript, and offer my thanks for this study, and offer my best wishes for the future, and for your safety in the coronavirus pandemic.

We thank you and wish the same with your safety

You have a really nice abstract

:)

Introduction

The introduction is nice, clear and well written. It has a nice flow and clearly states the aims of the study. Perhaps a little bit of extra detail on the modification maybe useful, but I do not feel strongly about that, and I will leave that up to your judgement.

We have believed that modification are enough described but please give us other suggestion about that.

The period between 6 and 7 weeks of development may facilitate certain testing, since puppies haven’t fully developed the fear imprinting response and they can be more easily handled by unknown people [3]. Please add in the comma between testing and since.

Amended

Methods

Again, generally well written, but a few minor points

I feel that the methods would be better written in a neutral gender. It almost sounds like the only way it would work is with a male.

We have rewritten in neutral gender

You mention the maternal experience and the environment. How was this analysed? Was this by yourselves, or the breeder, and if it is the latter, how can you be sure that it was honest or standardised?

We can not be sure for information about adequate maternal behaviour. For this reason the original sentence in the conclusion session was: In this study, our results did not show an effect of adequate maternal behaviour on performance response to subtests. This could be due to the subjectivizes of the breeders asked to judge this behaviour.

We have added the following sentence: For future studies, maternal behaviour should also be recorded directly to determine whether it has an impact on responses to these test. More specifically, in order to have a better standardization of observed behaviours, maternal behavior should be recorded 1 day per week continuously every second hour over a 24-hour period during the first 3weeks postpartum as described by Foyer et al. [20]

I particularly struggled to follow the following ‘The interval between behavioural responses for each test was obtained dividing by 5, 4 or 3 depending of the number of observed reactions’. Is it possible to reword it please?

Amended

Table 1- could you group these by size so it is easier to read?

Amended

Results

Some of the tables would be nicer as a proper table rather than as it is – i.e. with lines like Table 1

Amended

Variance due to the litter was very low in all tasks, except the sudden appearance and noise subtests. Add in the comma between tasks and except

Amended

You have a P value of 0.06 being statistically significant. It is more common to use 0.05. Is there a reason why this value was chosen? I feel that this should be stated somewhere in the text, probably the methods (unless I missed it)

We have added this information in the text

Discussion

Again, I really like the discussion

There are a few random statements which just appear dumped in certain places which could be better integrated elsewhere in the discussion

I think that your conclusion is good, but maybe over egging it somewhat. Perhaps consider toning it down

We have modified the conclusion accordingly your suggestion

Decision Letter 2

Simon Clegg

6 Jul 2020

Behavior test for seven-week old puppies ( Canis familiaris ): inter-rater reliability and factors associated with test performance

PONE-D-20-02339R2

Dear Dr. Alberghina

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Simon Clegg, PhD

Academic Editor

PLOS ONE

Additional Editor Comments:

Many thanks for submitting your manuscript to PLOS One

I have read through the manuscript, and as you have addressed all comments and the manuscript reads well, I have recommended your manuscript for publication

You should hear from the Editorial Office

It was a pleasure working with you and I wish you all the best for your future research

Hope you are keeping safe and well in these difficult times

Thanks

Simon

Acceptance letter

Simon Clegg

8 Jul 2020

PONE-D-20-02339R2

Behavior test for seven-week old puppies ( Canis familiaris ): inter-rater reliability and factors associated with test performance

Dear Dr. Alberghina:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Simon Clegg

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data

    (XLSX)

    S1 Fig

    (PNG)

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES