We appreciate the opportunity to respond to the review of our work (1) provided in Kukucka (2) and offer the following responses and clarifications:
“First, despite characterizing inconclusive decisions as ‘neither correct nor incorrect,’ the authors effectively counted them as correct…”
Examiners’ performance is multi-dimensional. Inconclusive decisions do not count as errors in calculating false positive/negative rates, but they also do not count as correct in calculating true positive/negative rates. Kukucka bases his assertion on a paper (3) that has been refuted (4, 5). The rate Kukucka suggests (“FPRCALLS”) is one of many rates we report: Appendix F1 discusses the varied means of handling inconclusive decisions in conclusion rates.
“Second… include both ‘definite’ and ‘probable’ judgments in the error rate numerator”
In our study and in operational casework, forensic examiners explicitly differentiate between levels in conclusion scales to convey different strengths of conclusions or weights of evidence. There is no single error rate: we present four distinct rates of incorrect decisions by which examiners can be assessed in terms of accuracy, as well as four distinct rates of correct decisions by which examiners can be assessed in terms of effectiveness. Blurring such explicit distinctions oversimplifies the results and may be misleading.
“…raises the false positive rate to 8.2%”
We report the rate Kukucka suggests (“FPR + IARCALLS”) in Appendix F2, but the correct value is 9.3%.
“…definitive and qualified conclusions have an equivalent impact on juror decision-making…”
Kukucka’s references (6, 7) do not support this assertion.
“Third, many examiners simply declined to compare some of the assigned sets…”
This is incorrect: participants were unable to choose which comparisons to complete. We controlled the order of assignments so that difficulty and sample attributes would be representative of the whole even if only half of the assigned samples were completed. If participants could have chosen “easy” comparisons and omitted “difficult” ones, we agree that this point might be relevant, but that is not how the test was actually designed or conducted.
“…under relatively ‘ideal’ conditions…”
There is no basis for this assertion. A supermajority of participants assessed the samples as a representative of casework. Many factors increased the relative level of difficulty compared to casework: no originals were provided, close nonmates were methodically selected and included twins, and some samples were of limited length and comparability. Kukucka’s assertion cites a paper (8) containing numerous errors and misrepresentations, detailed in refs. (9–11).
“I fear that its purported error rate could dangerously mislead stakeholders…”
Reducing complex results to a single number can indeed be misleading. Kukuka essentially selects a method resulting in a high error rate by reducing the denominator and combining distinct categories in the numerator. This is one of several ways of summarizing the results; we are concerned that Kukucka represents this as a sole error rate. We present the results of our research in an empirical, transparent manner. By providing data with a variety of metrics, readers have flexibility in interpreting and applying results in a manner specific to their needs.
Disclaimer
This is publication number 23.05 of the FBI Laboratory Division. Names of commercial manufacturers are provided for identification purposes only and inclusion does not imply endorsement of the manufacturer or its products or services by the FBI. This work was funded by the FBI Laboratory Division; Ideal Innovations and Noblis were funded under a contract award to Ideal Innovations Inc. from the FBI Laboratory. The views expressed are those of the authors and do not necessarily reflect the official policy or position of the FBI or the US Government.
Acknowledgments
Author contributions
R.A.H., L.E., M.S., J.B., and E.M.P. designed research; R.A.H., L.E., N.R., J.B., and R.S.P. performed research; R.A.H., L.E., N.R., P.B., T.M.B., M.S., and R.S.P. analyzed data; P.B. contributed FDE SME; E.M.P. overall project management and oversight; and R.A.H., L.E., N.R., M.S., J.B., and E.M.P. wrote the paper.
Competing interest
The authors declare no competing interest.
References
- 1.Hicklin R. A., et al. , Accuracy and reliability of forensic handwriting comparisons. Proc. Natl. Acad. Sci. U.S.A. 119, e2119944119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kukucka J., Critique of forensic error rate calculations in Hicklin et al. Proc. Natl. Acad. Sci. U.S.A. [Google Scholar]
- 3.Dror I. E., Scurich N., (Mis)use of scientific measurements in forensic science. Forensic Sci. Int. Synerg. 2, 333–338 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Biedermann A., Kotsoglou K. N., Forensic science and the principle of excluded middle: “Inconclusive” decisions and the structure of error rate studies. Forensic Sci. Int. Synerg. 3 100147 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Weller T. J., Morris M. D., Commentary on: I. Dror, N Scurich “(Mis)use of scientific measurements in forensic science” Forensic Science International: Synergy 2020 10.1016/j.fsisyn.2020.08.006. Forensic Sci. Int. Synerg. 2, 701–702 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Garrett B. L., Scurich N., Crozier W. E., Mock jurors’ evaluation of firearm examiner testimony. Law Hum. Behav. 44, 412–423 (2020). [DOI] [PubMed] [Google Scholar]
- 7.Thompson W. C., Newman E. J., Lay understanding of forensic statistics: Evaluation of random match probabilities, likelihood ratios, and verbal equivalents. Law Hum. Behav. 39, 332–349 (2015). [DOI] [PubMed] [Google Scholar]
- 8.Haber R. N., Haber L., Experimental results of fingerprint comparison validity and reliability: A review and critical analysis. Sci. Justice 54, 375–389 (2014). [DOI] [PubMed] [Google Scholar]
- 9.Hicklin R. A., Ulery B. T., Buscaglia J., Roberts M. A., In response to Haber and Haber, “Experimental results of fingerprint comparison validity and reliability: A review and critical analysis”. Sci. Justice 54, 390–391 (2014). [DOI] [PubMed] [Google Scholar]
- 10.Thompson M. B., Tangen J. M., Generalization in fingerprint matching experiments. Sci. Justice 54, 391–392 (2014). [DOI] [PubMed] [Google Scholar]
- 11.Langenburg G., Neumann C., Champod C., A comment on experimental results of fingerprint comparison validity and reliability: A review and critical analysis. Sci. Justice 54, 393–395 (2014). [DOI] [PubMed] [Google Scholar]
