Abstract
Expert opinions admitted by courts are not always valid and reliable. However, we know little about how indicators of opinion quality affect the persuasiveness of an expert. In this study 25 Australian magistrates and 22 jury-eligible lay people rated the persuasiveness (via credibility, value and weight) of either a high- or a low-quality expert opinion. Opinion quality was determined using attributes specified in the Expert Persuasion Expectancy (ExPEx) framework: Field, Specialty, Ability and Trustworthiness. Both magistrates and jurors were significantly more persuaded by the high- than the low-quality expert opinion. Magistrates were also significantly more sceptical of the expert opinion than lay people, and when given the opportunity sought information that was logically relevant to their decision. These results suggest that magistrates can differentiate between high- and low-quality expert opinions, but it is unclear whether the information they need for the task is actually available for use during trials.
Key words: expert evidence, expert testimony, forensic science, judges, jury decision-making, persuasion
Expert evidence is regularly used to assist courts with decision-making (Gross, 1991; Jurs, 2015). However, the quality of expert opinions has been, and continues to be, of significant concern (Edmond & San Roque, 2012; Findley, 2008; Martire & Edmond, 2016; National Research Council, 2009; President’s Council of Advisors on Science and Technology, PCAST, 2016; Risinger, Denbeaux, & Saks, 1989). The quality of expert opinions in a range of disciplines has been questioned, including the forensic sciences, mental health diagnoses, medical causation, gender discrimination and eyewitness reliability among others (e.g. Bernstein, 1990; Cole, 2003; Cunliffe & Edmond, 2013; Deitch, 2009; Edens et al., 2012; Imwinkelried, 2009; Martire & Kemp, 2011; Monahan, Walker, & Mitchell, 2008; Taupin, 2004). Even so, the opinions of experts from these and other areas remain persuasive to factfinders.
Courts worldwide have been influenced by the mistaken and inaccurate opinions of expert witnesses. False and flawed expert testimony has contributed to approximately 60% of known wrongful convictions identified by the United States Innocence Project (Garrett, 2017; Garrett & Neufeld, 2009), one third (31%) of 71 Australian exonerations (Dioso-Villa, 2015) and one quarter (24%) of exonerations in the United States National Registry of Exonerations (Gross & Shaffer, 2012). Indeed, discredited and unvalidated forms of expert evidence continue to be admitted and relied on by courts despite authoritative criticism (National Research Council, 2009; PCAST, 2016; Skene, 2018). This leaves factfinders with the challenging task of differentiating between witnesses who are genuine experts and those who are not.
In Australia, expert evidence is most likely to be evaluated by judicial officers rather than lay juries. Of 592,455 finalised defendants in 2017–2018, 92% (or 545,251) appeared in the Magistrates’ Courts (Australian Bureau of Statistics, ABS, 2019). Magistrates’ Courts are summary courts, meaning that magistrates rather than juries determine the verdict. They also generally consider matters of a less serious nature than the Higher Courts. However, this does not remove the need for expert witnesses. In 2017–2018, Australian Magistrates’ Courts resolved 40,576 theft offences, 56,638 illicit drug offences and 71,405 regulatory driving offences (Australian Bureau of Statistics, 2019). These matters often require the testimony of, for example, fingerprint analysts, forensic chemists, pharmacologists or toxicologists to assist to establish the fact of an offence, or the identity of the perpetrator. Thus it is likely that magistrates regularly hear expert testimony. This makes it important for us to understand how magistrates evaluate expert opinion evidence and whether it can be improved.
To date, research examining how judges and magistrates evaluate expert opinion quality suggests that their performance is likely to be imperfect. While there is some evidence that judges evaluating scientific opinions have a good understanding of some key indicators of scientific reliability (i.e. peer review and general acceptance; Gatowski et al., 2001), as well as some types of evidence (i.e. mitochondrial DNA; Hans, 2007), weaknesses have also been found. Judges frequently make logical errors (i.e. the prosecutors’ fallacy; De Keijser & Elffers, 2012), mistakenly believe that scientific knowledge can and should be categorical or certain (Faigman, 2006), misunderstand falsifiability and error rate (Gatowski et al., 2001) and are unable to differentiate valid and invalid research (Kovera & McAuliff, 2000). These failures are likely to impair the assessment of expert quality. However, there are other relevant considerations to be taken into account.
At least eight attributes have been identified as logically relevant to determining the quality of an expert opinion (Martire, Edmond & Navarro, 2020; Walton, 1997). These are: Foundation, Field, Specialty, Ability, Opinion, Support, Consistency and Trustworthiness. The indicators of scientific validity and reliability described above relate to Foundation and Ability attributes. Specifically, Foundation covers field, discipline and technique validity and reliability (e.g. error rate, falsifiability, study design, etc.), while Ability relates to the personal proficiency or competence of the witness. Field and Specialty (respectively) relate to the general and specific training, study and experience of the witness relevant to their opinion. Opinion is about the content, conservatism and comprehensibility of the opinion expressed. Support includes the evidentiary basis for, and logic of, the opinion. Consistency concerns whether other experts agree with the opinion. And Trustworthiness incorporates the bias, honesty and conscientiousness of the witness. These attributes have been formalised in the Expert Persuasion Expectancy (ExPEx) framework (Martire et al. 2020) as follows:
Foundation – Does training, study or experience in the field F support assertions like A?
Field – Does witness W have training, study or experience in the field F?
Specialty – Does W have training, study or experience specific to assertions like A?
Ability – Does W provide assertions like A accurately and reliably?
Opinion – Does W convey A clearly, and with necessary qualifications?
Support – Does W rely on evidence in making A?
Consistency – Is A consistent with what other experts assert?
Trustworthiness – Is W personally reliable as a source?
At present, we have only limited information about the extent to which magistrates and judges attend to and value these logically relevant attributes when assessing the quality of an expert opinion. Champagne, Shuman, and Whitaker (1990) surveyed 10 United States judges who noted a diverse array of attributes relevant to expert quality. These included: credentials (Field or Specialty), bias (Trustworthiness), methodology (Foundation or Support), communication skills (Opinion) and experience (Field or Specialty). Indeed, only one reported attribute clearly fell outside the ExPEx framework – witness demeanour. However, it is not clear whether the attributes that judges believe to be relevant to quality actually influence their evaluation of an expert opinion.
More recently, Tadei, Finnila, Reite, Antfolk, and Santtila (2016) surveyed 87 judges in Finland to explore how they determined the quality of an expert opinion. Judges rated the importance of seven listed indicators of reliability: falsifiability (Foundation), error rate (Foundation or Ability), peer-reviewed research (Field or Specialty), scientific acceptance (Foundation), practical acceptance (Consistency), work experience (Field or Specialty) and research activity (Field or Specialty). They were also asked to read five vignettes and note any questions they would ask to evaluate the reliability of the expert opinion in the scenario.
The judges’ ratings showed that work experience was seen as the most important of the listed attributes for determining an experts’ reliability. This was followed by error rate, practical acceptance, scientific acceptance, peer-reviewed research, falsifiability and research activity. While 54% of judges also posed questions about error rate in the case scenarios, most of them did not ask about the other listed indicators. Instead, they wanted to know more about the opinion (83%), its basis (77%), and the supporting research (56%). These questions relate to the Opinion, Support and Foundation attributes in ExPEx, respectively.
Overall then, we have some preliminary evidence about how judges believe they assess the quality of an expert opinion (Champagne et al., 1990). We also know a little about the information they might seek to complete their assessments (Tadei et al., 2016), and that much of this information is logically relevant to determining the quality of an expert opinion. However, we do not know whether this logically relevant information actually affects the decision-making of judges when it is available.
In this article we examine whether the decision-making of magistrates is affected by attributes logically relevant to the quality of the expert opinion. Specifically, we examine whether magistrates consider strong expert opinions more persuasive than weak ones when quality is manipulated via ExPEx attributes. We also compare the performance of magistrates to that of lay people for reference. We predict that magistrates and lay people will be significantly more persuaded by high- rather than low-quality expert opinion evidence when operationalised in terms of ExPEx attributes.
In addition, following from Tadei et al. (2016), we examine whether magistrates seek information that is logically relevant to their assessment of expert opinion quality. If magistrates request information that is within the ExPEx framework, rather than outside of it, this suggests that magistrates know which attributes of an expert opinion should be taken into account. It would also indicate that the ExPEx framework usefully represents the informational needs of judges.
Method
Design
We used a 2 (sample: MTurk vs. magistrate) × 2 (expert quality: high vs. low) between-subjects quasi-experimental design. Participants were randomly allocated to expert quality condition, while sample was determined by recruitment source. The dependent variable was persuasiveness from 0 to 100. Data and materials for this study can be accessed in Supplementary Materials at https://tinyurl.com/ybxnh7x3. Please note that the demographic information is abridged in the publicly available dataset to preserve the anonymity of respondents. After all exclusions, the allocation to condition was as follows: highmag n = 10, highMTurk n = 13, lowmag n = 15, lowMTurk n = 9.
Participants
Magistrates
Participants were judicial officers attending a conference of magistrates in Australia. Thirty-one conference delegates voluntarily completed the study at the beginning of a training session. Only respondents reporting that they were judicial officers were retained in the sample (n = 25); however, not all of these participants provided responses to all questions. Most identified as male (68.0% of 25 responding) and identified as either Caucasian (88.0%) or Aboriginal/Torres Strait Islander (8.0%). Almost all obtained their legal qualifications in Australia (95.5% of 22 responding) and had an average 34.6 years of legal experience (SD = 8.91, range = 12.00–45.00, n = 22). The sample was overwhelmingly composed of magistrates (90.0% of 20 responding) who had been in the position for an average of 10.22 years (SD = 9.73, range = 1–31 years, n = 22). All participants reported that English was their first language.
MTurk
A size-matched sample of lay participants was recruited online through Amazon Mechanical Turk (MTurk). To ensure respondents were human, all participants completed a CAPTCHA verification (Von Ahn, Blum, Hopper, & Langford, 2003). Participants were compensated US$1.00 for their time. Only participants that: (a) reported being 18 years or older, (b) were jury eligible, and (c) passed attention-check items were retained in the sample. Of 27 respondents, three were excluded for failing Criterion (b), and two failed Criterion (c), leaving 22 respondents in the final sample. The majority of lay participants identified as male (68.2%); their mean age was 34.68 years (SD = 8.96, range = 21.00–58.00). The majority (77.3%) identified as Caucasian, and 40.9% reported college/university as their highest level of education. Almost three quarters of the sample had been called for jury duty (72.7%), and 27.3% had jury experience. All participants were native English speakers.
Materials
Expert report
Participants were provided with a brief statement describing a forensic gait analyst and their opinion. Participants in the ‘high’-quality condition read a version of the expert report where the analyst was logically strong on all eight ExPEx attributes. Participants in the ‘low’-quality condition read a version of the report where the analyst was logically weak on the Field, Specialty, Ability and Trustworthiness ExPEx attributes, but strong on Foundation, Opinion, Support and Consistency. This reflects the real-world situation where an expert opinion is likely to include some areas of strength and some areas of weakness. The attributes we manipulated also overlap with those reported by judges as important for their evaluations of expert quality. See Table 1 for verbatim text for strong and weak report versions.
Table 1.
Strong and weak versions of ExPEx attributes.
| ExPEx attribute | Expert opinion version (strong/weak) |
|---|---|
| Foundation | An extensive review of the medical literature reveals that formal training, study and experience working as a clinical podiatrist allows for the identification of a perpetrator through the comparison of (a) gait patterns captured on CCTV with (b) the gait of suspected offenders. |
| Field | Dr X has formal training, study and experience working as a clinical podiatrist/a hand surgeon. |
| Specialty | Dr X has not undertaken training courses specifically instructing how to make comparisons from low-quality CCTV footage to high-quality suspect footage for the purpose of gait analysis. |
| Ability | Dr X has been tested to examine whether he is accurately able to match suspects to perpetrators on the basis of a gait comparison using CCTV footage. Dr X performs the task very well and rarely makes mistakes/performs the task very poorly and commonly makes mistakes. |
| Opinion | In his report Dr X gave the opinion that he is extremely confident that the perpetrator from the CCTV and the suspect are the same person given the observed similarities in their gait. |
| Support | When asked by the Court, Dr X provided extensive evidence in support of this opinion. |
| Consistency | The opinion expressed by Dr X was independently peer reviewed by two other clinical podiatrists with formal training, study and experience. These clinical podiatrists reached the same opinion as Dr X. |
| Trustworthiness | Dr X is a court appointed expert in this case. He has worked for both the prosecution and the defence in the past showing that he does not have a pro-prosecution or pro-defence bias in his work./Dr X is known as a ‘hired gun’ and only testifies for the prosecution. |
Note: ExPEx = Expert Persuasion Expectancy.
ExPEx attributes
Participants answered one question about each of the eight ExPEx attributes to serve as a manipulation check (see Table 2). Questions were presented in a random order and were answered on a continuous scale from 0 to 100 with anchors specified in Table 2. Magistrates completed a pen-and-paper version of the study and were asked to place an ‘x’ on a straight line at the point that most accurately expressed their belief. The line was 100 mm in length, and responses were measured and converted to a number out of 100. Lay participants completed the study online via Qualtrics and provided their response by moving on-screen sliders. The sliders were set to ‘50’ by default and had to be moved to progress to the next item.
Table 2.
ExPEx attribute and persuasiveness questions.
| ExPEx attribute | Question | Scale anchors |
|---|---|---|
| Foundation | Does training, study, and experience in clinical podiatry support assertions that the perpetrator from the CCTV and the suspect are the same person? | Not at all to Definitely |
| Field | Does Dr X have training, study, and/or experience in clinical podiatry? | Not at all to Definitely |
| Specialty | Does Dr X have training, study, and/or experience specific to making assertions that the perpetrator from the CCTV and the suspect are the same person? | Not at all to Definitely |
| Ability | Does Dr X make assertions that the perpetrator from the CCTV and the suspect are the same person accurately and reliably? | Not at all to Definitely |
| Opinion | Did Dr X convey their assertion that the perpetrator from the CCTV and the suspect are the same person clearly, and with necessary qualifications/limitations? | Not at all to Definitely |
| Support | Did Dr X rely on evidence when forming their assertion that the perpetrator from the CCTV and the suspect are the same person? | Not at all to Definitely |
| Consistent | Is the assertion that the perpetrator from the CCTV and the suspect are the same person consistent with what other experts in forensic gait analysis would assert? | Not at all to Definitely |
| Trustworthiness | Do you believe that Dr X is fair, impartial and objective? | Not at all to Definitely |
| Persuasiveness |
|
|
| Credibility | How credible is Dr X? | Not at all to Definitely credible |
| Value | How valuable was Dr X’s testimony? | Not at all to Definitely valuable |
| Weight | How much weight do you give to Dr X’s testimony? | None at all to The most possible |
Note: ExPEx = Expert Persuasion Expectancy.
Persuasiveness
Questions about the credibility of the expert, the value of their testimony and the weight given to the testimony (from 0 to 100) were averaged to create a single measure of persuasiveness for each participant (see Table 2 for verbatim question wording and response anchors). Participants’ credibility, value and weight ratings were all strongly and positively correlated. All bivariate correlation coefficients were ≥.821 (credibility–value = .821; credibility–weight = .873; value–weight = .913), all ps < .001, Cronbach’s α = .95.
Agreement
Binary (yes/no) agreement with the expert opinion was measured to cross-validate persuasiveness ratings. Participants were asked ‘If Dr X reported that the defendant was the person in the crime-related images, would you agree with that opinion?’.
Requested additional information
Magistrates were asked to nominate three things that they would most like more information about to help in evaluating the expert’s opinion (or answer N/A). Space was provided for three open responses.
Demographic information
The demographic information questions varied by participant sample. Magistrates were asked to provide their legal qualifications, the jurisdiction where their qualifications were obtained, past legal experience, current position, year of admission as a legal practitioner, year of appointment as a magistrate or judge and level of judicial appointment. MTurk workers were asked about their age, highest level of completed education, jury eligibility and experience. Both groups of participants were asked about their ethnic/cultural background, gender identity and whether English was their first language. Verbatim demographic questions are available in the Supplementary Materials.
Procedure
After providing consent, participants were randomly assigned to receive the high- or low-quality expert report. Participants were advised that they would be asked questions after reading the report and were free to review it when providing their answers. Participants then answered questions about the ExPEx attributes, persuasiveness and their agreement with the expert opinion. Magistrates were able to request additional information at this point, before all participants provided demographic information. All participants were debriefed and thanked for their involvement at the end of the study.
Results
Manipulation checks
Independent sample t tests comparing ratings of the eight ExPEx attributes in the strong and weak conditions were conducted as manipulation checks (see Table 3). The manipulated attributes, Field, t(45) = −13.06, p < .001 (95% confidence interval, CI [−91.25, −66.86]), Specialty, t(45) = −11.76, p < .001 (95% CI [−85.58, −60.55]), Ability, t(45) = −6.22, p < .001 (95% CI [−68.07, −34.76]) and Trustworthiness, t(44) = −10.94, p < .001 (95% CI [−80.73, −55.62]) were all rated significantly lower in the low- rather than in the high-quality condition. Significant differences were also observed for Foundation and Opinion even though these attributes were not varied by condition. Ratings of Support and Consistency attributes did not differ by opinion quality (see Table 3).
Table 3.
Independent samples t tests for ExPEx attribute ratings by expert quality condition.
| ExPEx attribute | WeakM (SD) | StrongM (SD) | t (df) | p | 95% CI |
|---|---|---|---|---|---|
| Foundation | 41.21 (37.45) | 65.17 (35.18) | −2.26 (45) | .029 | [−45.33, −2.60] |
| Field | 13.21 (20.57) | 92.26 (20.94) | −13.06 (45) | <.001 | [−91.25, −66.86] |
| Specialty | 14.50 (22.02) | 87.57 (20.53) | −11.76 (45) | <.001 | [−85.58, −60.56] |
| Ability | 27.50 (33.19) | 78.91 (22.14) | −6.22 (45) | <.001 | [−68.07, −34.76] |
| Opinion | 20.50 (23.27) | 72.74 (29.80) | −6.72 (45) | <.001 | [−67.91, −36.57] |
| Support | 60.74 (31.65) | 77.09 (26.69) | −1.89 (44) | .065 | [−33.75, 1.05] |
| Consistency | 64.78 (36.42) | 80.39 (26.90) | −1.65 (40.50) | .106 | [−34.68, 3.46] |
| Trustworthiness | 17.22 (21.99) | 85.39 (20.22) | −10.94 (44) | <.001 | [−80.73, −55.62] |
Note: ExPEx = Expert Persuasion Expectancy; CI = confidence interval.
Persuasiveness
A 2 (sample: MTurk vs. magistrate) × 2 (expert quality: high vs. low) between-participants analysis of variance (ANOVA) was conducted to examine whether participant sample and expert quality significantly affected the persuasiveness of the expert opinion (see Figure 1). Persuasiveness was rated significantly higher in the high-quality (M = 79.13, SD = 20.97) than in the low-quality condition (M = 16.67, SD = 19.42), indicating that participants from both samples were sensitive to expert opinion quality, F(1, 47) = 136.93, p < .001, = .761. Overall, MTurk participants also viewed the expert as significantly more persuasive (Mlow = 30.15, SDlow = 24.49; Mhigh = 90.38, SDhigh = 10.58) than did the magistrates (Mlow = 8.58, SDlow = 9.49; Mhigh = 64.50, SDhigh = 22.48), F(1, 47) = 22.85, p < .001, = .347. Simple main effects analyses revealed no significant interaction between expert quality condition and sample, F(1, 47) = 0.19, p = .67, = .004.
Figure 1.
Estimated marginal mean persuasiveness by participant sample and expert report quality (95% confidence interval).
Agreement
Linear regression revealed that, irrespective of report quality, agreement was associated with persuasiveness, as indicated by a positive unstandardised beta coefficient (b = 61.49, p < .001, 95% CI [49.09, 73.88]). Those participants who ultimately agreed with the expert rated them on average 61.49 points (out of 100) more persuasive than those participants who disagreed with the expert. Further, the overall model was significant, accounting for 68.9% of the variance in persuasiveness (R = .83, R2 = .689, adjusted R2 = .68), F(1, 45) = 99.87, p < .001.
Requested additional information
Sixteen magistrates requested 40 pieces of additional information about the expert and their opinion. A thematic analysis was undertaken to examine whether magistrates’ requests for additional information related to the ExPEx attributes. Using Braun and Clarke’s (2006) theoretical thematic analysis strategy, magistrates’ written responses were analysed independently by two raters. A ‘dictionary’ outlining the theory and characteristics regarding each of the eight ExPEx attributes was developed prior to coding to assist in attributing excerpts of the responses to a dimension. Since responses could be coded into multiple categories it was possible for the raters to agree about: (a) all categories for an item (i.e. exact agreement); (b) one but not all categories for an item (i.e. partial agreement); or (c) no categories for an item (i.e. disagreement).
After the first round of coding, the two independent coders were in exact agreement about 24 of the 40 items (60.0%), partial agreement about an additional 10 items (25.0%) and disagreed about six items (15.0%). Disagreements were resolved by discussion. Following resolution, all but two of the requested items related to one or more ExPEx attributes, and one of these responses was partially illegible. Magistrates most frequently requested additional information relating to Specialty (27.4% of codes), Support (17.6%) and Field and Opinion (each 11.8%). See Table 4 for ExPEx attribute frequency and indicative requests.
Table 4.
ExPEx attribute and indicative requests for additional information by magistrates.
| ExPEx attribute | Proportion of codes (%) | Indicative request (requester ID) |
|---|---|---|
| Foundation | 7.84 | ‘Margin of error in accurate identification.’ (25) |
| Field | 11.76 | ‘What qualifications/experience (if any) make Dr X more qualified than any lay person . . . ’ (2) |
| Specialty | 27.45 | ‘Any previous experience in evaluating such evidence . . . ’ (5) |
| Ability | 7.84 | ‘Explanation for previous errors in other cases?’ (15) |
| Opinion | 11.76 | ‘What are limitations in undertaking exercise.’ (27) |
| Support | 17.65 | ‘What evidence does he rely upon to support his conclusion . . . ’ (2) |
| Consistent | 7.84 | ‘Any opposing reports.’ (26) |
| Trustworthiness | 3.92 | ‘Basis for “hired gun”’ (5) |
| Other | 3.92 | ‘Dr X to deliver the report in first person.’ (23) |
Note: ExPEx = Expert Persuasion Expectancy.
Predictors of persuasiveness
We conducted an additional unplanned regression to explore whether participants’ ExPEx attribute ratings predicted expert persuasiveness. However, since Field, Specialty, Ability and Trustworthiness were manipulated simultaneously, they were strongly and positively correlated. All bivariate correlation coefficients were ≥.737 (Field–Specialty = .949; Field–Ability = .761; Field–Trustworthiness = .892; Specialty–Ability = .737; Specialty–Trustworthiness = .951; Ability–Trustworthiness = .771), all ps < .001, Cronbach’s α = .96. This meant that they could not be included as separate predictors because multicollinearity violates the assumptions of regression analyses. Therefore, a continuous meta-variable ‘Quality’ was computed using the average Field, Specialty, Ability and Trustworthiness ratings for each participant. This new attribute was entered into a linear regression to predict persuasiveness ratings.
The regression revealed that persuasiveness was significantly predicted by perceptions of quality. On average, as quality ratings increased by 1 unit, persuasiveness ratings increased by 0.91 units b = 0.91, p < .001, 95% CI (0.82, 1.00) and accounted for 89.9% of the variance in persuasiveness ratings R = .949, R2 = .901, adjusted R2 = .899, F(1, 45) = 410.30, p < .001.
Discussion
In this study we examined how magistrates’ perceptions of the persuasiveness of an expert opinion were affected by weaknesses in Field, Specialty, Ability and Trustworthiness attributes. We found that both magistrates and jury-eligible lay people were significantly more persuaded by an expert opinion that was strong in these attributes than one that was weak. We also found that participants were more likely to agree than disagree with the opinion of the expert they regarded as persuasive, and that ratings of persuasiveness were strongly determined by perceptions of the expert’s ‘quality’ (averaged across Field, Specialty, Ability and Trustworthiness).
Furthermore, when given the opportunity, magistrates asked for additional information about attributes that were logically relevant to the quality of the opinion and rarely requested information outside the ExPEx framework. In particular, magistrates tended to seek additional information about the specific training, study and experience relevant to opinion (Specialty) and the basis for that opinion (Support). These results are consistent with the self-reported beliefs (Champagne et al., 1990) and questioning strategies (Tadei et al., 2016) reported by judges in past research. They also show that magistrates’ perceptions of the quality of an expert opinion are actually affected by attributes regarded as influential and relevant, and suggest that the ExPEx framework captures the informational needs of magistrates well.
We also found evidence that judges were critical of expert opinions in general. Although magistrates and lay people were equally sensitive to opinion quality, magistrates were more sceptical of (i.e. less persuaded by) expert opinions overall. This may have been a particular reaction to the experimental scenario, or it may reflect a general and genuine critical approach to expert evidence among magistrates. However, we cannot disentangle these explanations given the design of the current study. Further research is needed directly comparing the assessments of judges and jurors to see whether a sceptical response is also evident in more realistic decision-making scenarios.
Overall, the results of this study suggest that magistrates may be more critical of expert opinions and more sensitive to their quality than previously believed (De Keijser & Elffers, 2012; Faigman, 2006; Hans, 2007; Kovera & McAuliff, 2000). However, it remains to be seen whether our results generalise to other experimental contexts, where other ExPEx attributes are manipulated, or are manipulated in different ways. Past research suggests that judges may not be sensitive to manipulations of Foundation if it is operationalised in terms of falsifiability or error rate (Gatowski et al., 2001), or to Opinion if operationalised using a prosecutor’s fallacy (De Keijser & Elffers, 2012). However, they may be able to detect Consistency if presented in terms of peer review or general acceptance (Gatowski et al., 2001). Clearly, if we hope to understand the strengths and weaknesses of the decision-making strategies used by professional and lay factfinders, we need to undertake more research that explores the assessment of expert opinion quality, particularly via studies that operationalise the quality of an expert opinion using multiple relevant attributes in addition to its scientific foundations.
Limitations
It is also important to note that the magistrates’ performance in experimental scenarios may not generalise to real-world decision-making. First, the information that we provided to our participants was simplistic and does not represent the level of technical complexity present in real expert reports or testimony (Freckelton, Goodman-Delahunty, Horan, & McKimmie, 2016; Howes, Julian, Kelty, Kemp, & Kirkbride, 2014; Howes, Kirkbride, Kelty, Julian, & Kemp, 2014). We also manipulated four quality cues simultaneously. We did this because the population of magistrates is small, meaning that we would not have had sufficient power to test for differences between more than two conditions. But, it is likely that this made it much easier for magistrates in our study to differentiate between high- and low-quality expert opinions than it would be in real trials where other procedural considerations, traditions and complexities come into play (Edmond, Cunliffe, Martire, & San Roque, 2019; Edmond, Martire, & San Roque, 2017). Consequently, our results should be interpreted cautiously but not disregarded.
Second, magistrates are not routinely provided information about all of the ExPEx attributes in real trials. Experts can present their evidence via abridged expert certificates, or they may submit a detailed report but not be called to give oral testimony. Furthermore, information about some ExPEx attributes can be very difficult to produce. For example, Ability information is notoriously hard to measure and is rarely if ever provided by experts to courts (Garrett & Mitchell, 2018; Martire & Edmond, 2016; Wilson-Wilde, Romano, & Smith, 2019). This may mean that the differentiation between high- and low-quality experts seen in our study cannot be reproduced by magistrates in trials because the same information is not available. Indeed, this gap between the information necessary for evaluating expert quality and the information available for evaluating expert quality may partly explain why judges who appear to know what information they need to assess expert opinion quality are at times persuaded by low-quality expert opinions in trials. Further research is required exploring this possibility.
Conclusion
This study is the first to show that magistrates’ evaluations of expert opinion quality can be influenced in appropriate ways by logically relevant expert attributes. While this is a positive result, we know that professional and lay fact-finders can and do rely on low-quality expert opinions at times. Therefore, we must continue to explore the basis for factfinders’ determinations of expert persuasiveness if we hope to reduce reliance on false and flawed expert opinions and improve the rectitude of trial outcomes.
Acknowledgements
Kind thanks to the magistrates who volunteered their time to participate in this study.
Funding Statement
This work was supported by the Australian Research Council under Linkage Project Grant LP160100008 to Kristy A. Martire.
Ethical standards
Declaration of conflicts of interest
Kristy A. Martire has declared no conflicts of interest.
Bronte Montgomery-Farrer has declared no conflicts of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the University of New South Wales (UNSW) Sydney Human Research Ethics Approval Panel (HREAP C Approval Number: HC3161) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Data availability statement
The data that support the findings of this study are openly available in OSF at [https://tinyurl.com/ybxnh7x3] https://osf.io/4nuyk/?view_only=6f43a73600844218971cb925fb621388
References
- Australian Bureau of Statistics (2019). Criminal Courts, Australia , 2017–18, cat. no. 4513.0, Retrieved from https://www.abs.gov.au/AUSSTATS/abs@.nsf/allprimarymainfeatures/D8D460DDF174BC36CA2582410016B417?opendocument
- Bernstein, D. (1990). Out of the Fryeing pan and into the fire: The expert witness problem in toxic tort legislation. Review of Litigation, 10, 117–160. [Google Scholar]
- Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. doi: 10.1191/1478088706qp063oa [DOI] [Google Scholar]
- Champagne, A., Shuman, D., & Whitaker, E. (1990). An empirical examination of the use of expert witnesses in American courts. Jurimetrics, 31, 375–392. [Google Scholar]
- Cole, S.A. (2003). Fingerprinting: The first junk science. Oklahoma City University Law Review, 28, 73–92. [Google Scholar]
- Cunliffe, E., & Edmond, G. (2013). Gaitkeeping in Canada: Mis-steps in assessing the reliability of expert testimony. Canadian Bar Review, 92, 327–368. [Google Scholar]
- De Keijser, J., & Elffers, H. (2012). Understanding of forensic expert reports by judges, defense lawyers and forensic professionals. Psychology, Crime & Law, 18(2), 191–207. doi: 10.1080/10683161003736744 [DOI] [Google Scholar]
- Deitch, A. (2009). An inconvenient tooth: Forensic odontology is an inadmissible junk science when it is used to match teeth to bitemarks in skin. Wisconsin Law Review, 5, 1205–1236. [Google Scholar]
- Dioso-Villa, R. (2015). A repository of wrongful convictions in Australia: First steps toward estimating prevalence and causal contributing factors. Flinders Law Journal, 17, 163–202. [Google Scholar]
- Edens, J.F., Smith, S.T., Magyar, M.S., Mullen, K., Pitta, A., & Petrila, J. (2012). Hired guns,”“charlatans,” and their “voodoo psychobabble”: Case law references to various forms of perceived bias among mental health expert witnesses. Psychological Services, 9(3), 259–271. doi: 10.1037/a0028264 [DOI] [PubMed] [Google Scholar]
- Edmond, G., Martire, K., & San Roque, M. (2017). Expert reports and the forensic sciences. UNSW Law Journal, 40, 590. [Google Scholar]
- Edmond, G., & San Roque, M. (2012). The cool crucible: Forensic science and the frailty of the criminal trial. Current Issues in Criminal Justice, 24(1), 51–68. doi: 10.1080/10345329.2012.12035944 [DOI] [Google Scholar]
- Edmond, G., Cunliffe, E., Martire, K.A., & San Roque, M. (2019). Forensic science evidence and the limits of cross-examination. Melbourne University Law Review. https://law.unimelb.edu.au/__data/assets/pdf_file/0007/3105646/Edmond-et-al-Advance-423.pdf
- Faigman, D.L. (2006). Judges as amateur scientists. Boston University Law Review, 86, 1207–1226. [Google Scholar]
- Findley, K.A. (2008). Innocents at risk: Adversary imbalance, forensic science, and the search for truth. Seton Hall Law Review, 38(3), 893–947. [Google Scholar]
- Freckelton, I., Goodman-Delahunty, J., Horan, J., & McKimmie, B. (2016). Expert evidence and criminal jury trials. New York, NY: Oxford University Press. [Google Scholar]
- Garrett, B.L. (2017). Actual innocence and wrongful convictions. In E. Luna (Ed.), Academy for justice, a report on scholarship and criminal justice reform (2017 Forthcoming). North Carolina, NC: Duke Law School Public Law & Legal Theory Series. [Google Scholar]
- Garrett, B.L., & Mitchell, G. (2018). The proficiency of experts. University of Pennsylvania Law Review, 166(4), 901–960. [Google Scholar]
- Garrett, B.L., & Neufeld, P.J. (2009). Invalid forensic science testimony and wrongful convictions. Virginia Law Review, 95(1), 1–97. https://www.jstor.org/stable/25475240 [Google Scholar]
- Gatowski, S.I., Dobbin, S.A., Richardson, J.T., Ginsburg, G.P., Merlino, M.L., & Dahir, V. (2001). Asking the gatekeepers: A national survey of judges on judging expert evidence in a post-Daubert world. Law and Human Behavior, 25(5), 433–458. doi: 10.1023/A:1012899030937 [DOI] [PubMed] [Google Scholar]
- Gross, S.R. (1991). Expert evidence. Wisconsin Law Review, 1, 1113–1232. [Google Scholar]
- Gross, S.R., & Shaffer, M. (2012). Exonerations in the United States, 1989–2012. University of Michigan Public Law Working Paper, 277. Michigan, US: University of Michigan Law School. [Google Scholar]
- Hans, V.P. (2007). Judges, juries, and scientific evidence. Journal of Law and Policy, 16, 19–46. [Google Scholar]
- Howes, L.M., Julian, R., Kelty, S.F., Kemp, N., & Kirkbride, K.P. (2014). The readability of expert reports for non-scientist report-users: Reports of DNA analysis. Forensic Science International, 237, 7–18. doi: 10.1016/j.forsciint.2014.01.007 [DOI] [PubMed] [Google Scholar]
- Howes, L.M., Kirkbride, K.P., Kelty, S.F., Julian, R., & Kemp, N. (2014). The readability of expert reports for non-scientist report-users: Reports of forensic comparison of glass. Forensic Science International, 236, 54–66. doi: 10.1016/j.forsciint.2013.12.031 [DOI] [PubMed] [Google Scholar]
- Imwinkelried, E.J. (2009). Shaken baby syndrome: a genuine battle of the scientific (and non-scientific) experts. Criminal Law Bulletin, 46(1), 1–44. doi: 10.2139/ssrn.1494672 [DOI] [Google Scholar]
- Jurs, A.W. (2015). Expert prevalence, persuasion, and price: What trial participants really think about experts. Indiana Law Journal, 91, 353–392. [Google Scholar]
- Kovera, M.B., & McAuliff, B.D. (2000). The effects of peer review and evidence quality on judge evaluations of psychological science: Are judges effective gatekeepers? Journal of Applied Psychology, 85(4), 574–586. doi: 10.1037//0021-9010.85.4.574 [DOI] [PubMed] [Google Scholar]
- Martire, K.A., & Edmond, G. (2016). Rethinking expert opinion evidence. Melbourne University Law Review, 40, 967–998. [Google Scholar]
- Martire, K.A., Edmond, G. & Navarro, D. (2020). Exploring juror evaluations of expert opinions using the expert persuasion expectancy framework. Legal and Criminological Psychology. doi: 10.1111/lcrp.12165 [DOI] [Google Scholar]
- Martire, K.A., & Kemp, R.I. (2011). Can experts help jurors to evaluate eyewitness evidence? A review of eyewitness expert effects. Legal and Criminological Psychology, 16(1), 24–36. doi: 10.1348/135532509X477225 [DOI] [Google Scholar]
- Monahan, J., Walker, L., & Mitchell, G. (2008). Contextual evidence of gender discrimination: The ascendance of “social frameworks.” Virginia Law Review, 94(7), 1715–1749. https://www.jstor.org/stable/25470599 [Google Scholar]
- National Research Council (2009). Strengthening forensic science in the United States: a path forward. Washington, DC: National Academies Press. doi: 10.17226/12589 [DOI] [Google Scholar]
- PCAST (President’s Council of Advisors on Science and Technology ). (2016). Report to the president: Forensic science in criminal courts: Ensuring scientific validity of feature-comparison methods. Washington, DC: Executive Office of the President, President's Council of Advisors on Science and Technology. http://www.crime-scene-investigator.net/PDF/forensic-science-in-criminal-courts-ensuring-scientific-validity-of-feature-comparison-methods.pdf [Google Scholar]
- Risinger, D.M., Denbeaux, M.P., & Saks, M.J. (1989). Exorcism of ignorance as a proxy for rational knowledge: the lessons of handwriting identification expertise. University of Pennsylvania Law Review, 137(3), 731. doi: 10.2307/3312276 [DOI] [Google Scholar]
- Skene, J.P. (2018). Up to the courts: Managing forensic testimony with limited scientific validity. Judicature, 102(1), 39–49. [Google Scholar]
- Tadei, A., Finnilä, K., Reite, A., Antfolk, J., & Santtila, P. (2016). Judges’ capacity to evaluate psychological and psychiatric expert testimony. Nordic Psychology, 68(3), 204–217. doi: 10.1080/19012276.2015.1125303 [DOI] [Google Scholar]
- Taupin, J.M. (2004). Forensic hair morphology comparison–a dying art or junk science? Science & Justice, 44(2), 95–100. doi: 10.1016/S1355-0306(04)71695-0 [DOI] [PubMed] [Google Scholar]
- Von Ahn, L., Blum, M., Hopper, N.J., & Langford, J. (2003). CAPTCHA: Using hard AI problems for security. In International conference on the theory and applications of cryptographic techniques (pp. 294–311). Berlin, DE: Springer. [Google Scholar]
- Walton, D. (1997). Appeal to expert opinion: Arguments from authority. Pennsylvania, PA: Penn State Press. [Google Scholar]
- Wilson-Wilde, L., Romano, H., & Smith, S. (2019). Error rates in proficiency testing in Australia. Australian Journal of Forensic Sciences, 51(sup1), S268–S271. doi: 10.1080/00450618.2019.1569154 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are openly available in OSF at [https://tinyurl.com/ybxnh7x3] https://osf.io/4nuyk/?view_only=6f43a73600844218971cb925fb621388

