Skip to main content
Forensic Science International: Synergy logoLink to Forensic Science International: Synergy
. 2021 Sep 27;3:100202. doi: 10.1016/j.fsisyn.2021.100202

Calculation of likelihood ratios for inference of biological sex from human skeletal remains

Geoffrey Stewart Morrison a,b,, Philip Weber a, Nabanita Basu a, Roberto Puch-Solis c, Patrick S Randolph-Quinney d,e
PMCID: PMC8498236  PMID: 34647000

Abstract

It is common in forensic anthropology to draw inferences (e.g., inferences with respect to biological sex of human remains) using statistical models applied to anthropometric data. Commonly used models can output posterior probabilities, but a threshold is usually applied in order to obtain a classification. In the forensic-anthropology literature, there is some unease with this “fall-off-the-cliff” approach. Proposals have been made to exclude results that fall within a “zone of uncertainty”, e.g., if the posterior probability for “male” is greater than 0.95 then the remains are classified as male, and if the posterior probability for “male” is less than 0.05 then the remains are classified as female, but if the posterior probability for “male” is between 0.05 and 0.95 the remains are not classified as either male or female. In the present paper, we propose what we believe is a simpler solution that is in line with interpretation of evidence in other branches of forensic science: implementation of the likelihood-ratio framework using relevant data, quantitative measurements, and statistical models. Statistical models that can implement this approach are already widely used in forensic anthropology. All that is required are minor modifications in the way those models are used and a change in the way practitioners and researchers think about the meaning of the output of those models. We explain how to calculate likelihood ratios using osteometric data and linear discriminant analysis, quadratic discriminant analysis, and logistic regression models. We also explain how to empirically validate likelihood-ratio models.

Keywords: Forensic inference and statistics, Forensic anthropology, Likelihood ratio, Sex assessment, Osteometry

1. Introduction

Forensic anthropology is the medico-legal application of biological anthropology. Forensic anthropologists apply to the analysis of human remains detailed knowledge of the development, the morphology, and the normal and abnormal variation of the human body. Analyses are conducted in order to assist legal-decision makers to make decisions with respect to identity of human remains [[1], [2], [3]]. Forensic anthropologists assist in the identification of individuals whose remains are severely decomposed, burned, disrupted, mutilated, or otherwise rendered difficult to recognize, particularly in cases where DNA evidence or odontological evidence are not available. Forensic anthropologists work on investigations related to unexplained natural deaths, accidents, homicide, war crimes, and genocide. They also increasingly work on disaster-victim identification, i.e., investigations related to mass fatality such as occur in building collapses, ship sinkings, and plane crashes.

Forensic anthropologists conduct evaluations with respect to chronological age, biological sex, living stature, and ancestry or population affinity. The analytical methods used can be divided into:

  • morphoscopic, i.e., based on visual assessment of shape and size; and

  • anthropometric/osteometric, i.e., based on instrumental measurements. The term “osteometric” applies to methods based on measurement of skeletal elements in particular.

Morphoscopic methods traditionally require considerable experience observing and understanding skeletal variation between individuals, populations, and age groups, and may be highly subjective in practice. Anthropometric methods are generally considered to be more objective, at least in the sense that intra- and inter-observer reliability is easier to assess. The most commonly used anthropometric measurements are point to point distances and angles. Some practitioners use a combination of morphoscopic and anthropometric methods.

It is common in forensic anthropology to draw inferences using statistical models applied to anthropometric data. A recently published book on the use of statistics and probability in forensic anthropology Obertová et al. [4], for instance, includes multiple chapters by different authors describing multiple statistical methods, including cluster analysis [5], logistic regression [6], and discriminant function analysis [7].

Use of classification models is common, and binary classification models have long been used to draw inferences with respect to biological sex, e.g., [[8], [9], [10], [11], [12], [13]]. Commonly used models such as linear discriminant analysis, quadratic discriminant analysis, and logistic regression can output posterior probabilities, but in the forensic-anthropology literature a threshold is usually applied in order to obtain a classification.1 For example, if the posterior probability for “male” is greater than 0.5 (or equivalently the posterior probability for “female” is less than 0.5) then the bone is classified as coming from a male, and if the posterior probability for “male” is less than 0.5 (or equivalently the posterior probability for “female” is greater than 0.5) then the bone is classified as coming from a female. In the forensic-anthropology literature, e.g., [[14], [15], [16]], there is evidence of some unease with this “fall-off-the-cliff” approach in which results with very different posterior probabilities, e.g., 0.51 and 0.99 are treated the same but results with very similar posterior probabilities, e.g., 0.49 and 0.51 are treated differently.

Galeta & Brůžek [7] reviews literature that expresses concern about a “zone of uncertainty”, see Fig. 1. In this “zone of uncertainty” the posterior probability is relatively close to 0.5, and the probability that a bone will be misclassified is relatively high. In order to attempt to avoid misclassification, a procedure is adopted whereby the bone is not classified unless the posterior probability is relatively far from 0.5, e.g., if the posterior probability for “male” is greater than 0.95 then the remains are classified as male, and if the posterior probability for “male” is less than 0.05 then the remains are classified as female, but if the posterior probability for “male” is between 0.05 and 0.95 the remains are not classified as either male or female. In this example, the “zone of uncertainty” is between posterior probabilities of 0.05 and 0.95. Galeta & Brůžek [7] states that “It is a conservative approach, but it brings a high confidence of sex estimation at both the individual and the population level.” The aim is to have a high correct-classification rate (a low classification-error rate) for the bones that are classified,2 but this comes at the cost of not classifying some bones and in fact not drawing any inference about the sex of the latter bones. Non-classification can occur in a high proportion, even the majority, of cases. Galeta & Brůžek [7] discusses trade-off between correct-classification rate and proportion of cases not classified.

Fig. 1.

Fig. 1

Example (based on humeral-head-diameter data from [18]) of a univariate linear discriminant analysis model showing multiple threshold values at different posterior probabilities for the hypothesis that the osteometric measurement comes from a male. In this example, the prior probabilities for “male” versus “female” are equal. Also shown are a “zone of uncertainty” between posterior probabilities of 0.05 and 0.95, and verbal expressions corresponding to the posterior probability ranges 0–0.2, 0.2–0.5, 0.5–0.8, and 0.8–1 (the latter proposed in [18]).

Bartholdy et al. [18] propose reporting the correct-classification rate corresponding to the posterior-probability value calculated for the bone of interest. They propose either calculating the correct-classification rate at the exact posterior-probability value obtained, or precalculating the correct-classification rate for a number of preselected posterior-probability threshold values, e.g., 0.8, 0.9, 0.95, then, once the posterior-probability value for the bone of interest is obtained, selecting the relevant precalculated result, i.e., if the exact posterior-probability value obtained is between 0.8 and 0.9, report the correct-classification rate that was precalculated excluding test results with posterior-probability values between 0.2 and 0.8, if the exact posterior-probability value obtained is between 0.9 and 0.95, report the correct-classification rate that was precalculated excluding test results with posterior-probability values between 0.1 and 0.9, etc. Bartholdy et al. [18] also suggests that results could be reported as “female”, “probable female”, “probable male”, and “male” for posterior-probability ranges of, e.g., 0–0.2, 0.2–0.5, 0.5–0.8, and 0.8–1 respectively (see Fig. 1). Jerković et al. [17] propose the inverse solution of choosing a desired correct-classification rate and then finding the posterior-probability range that should be excluded in order to obtain this correct-classification rate.3

In the present paper, we propose what we believe is a simpler solution to the concerns expressed in the forensic-anthropology literature. We propose a move away from approaches in which the output is discretized into two or more bins, to an approach which makes direct use of continuously-valued output. Statistical models that can implement this approach are already widely used in forensic anthropology – all that is required to adopt this approach are minor modifications in the way those models are used and a change in the way practitioners and researchers think about the meaning of the output of the models. What we propose is implementation of the likelihood-ratio framework using relevant data, quantitative measurements, and statistical models.

We focus on explaining how to calculate likelihood ratios using linear discriminant analysis, quadratic discriminant analysis, and logistic regression models applied to osteometric data. For simplicity of exposition, we use data consisting of measurements made on a single skeletal element from each individual. The skeletal element we use is a humerus – humeri exhibit sexual dimorphism. The computer code for performing the calculations described in the present paper is provided at http://geoff-morrison.net/#LR_anthropology_2021. Parallel versions of the code are provided for Matlab, Python, and R.

2. Likelihood-ratio framework

Use of the likelihood-ratio framework is advocated by many who work in the area of forensic inference and statistics, e.g., Aitken et al. [19] with 31 authors/supporters, Morrison et al. [20] with 19 authors/supporters, and Morrison et al. [21] with 20 authors/supporters. Its use is also recommended in guidance documents issued by the following organizations:

  • Association of Forensic Science Providers of the United Kingdom and of the Republic of Ireland (AFSP)4 in 2009 [22].

  • Royal Statistical Society (RSS)5 in 2010 [23].

  • European Network of Forensic Science Institutes (ENFSI)6 in 2015 [24].

  • National Institute of Forensic Science of the Australia New Zealand Policing Advisory Agency (NIFS ANZPAA)7 in 2017 [25].

  • American Statistical Association (ASA)8 in 2019 [26].

  • Forensic Science Regulator for England & Wales (FSR)9 in 2021 [27].

Introductory texts on the likelihood-ratio framework include [[28], [29], [30], [31], [32], [33], [34], [35]]. Publications advocating or describing application of the likelihood-ratio framework in forensic anthropology include [[36], [37], [38], [39], [40], [41]].10

In the present paper, we do not attempt to provide a general introduction to the likelihood-ratio framework and arguments in favour of its use. Such information can be found in the references listed above. Instead, we focus on how to calculate likelihood ratios using the kinds of data and statistical models already familiar to practitioners and researchers in forensic anthropology. More complicated models can be used, and could potentially result in better performance, but for simplicity we focus on linear discriminant analysis, quadratic discriminant analysis, and logistic regression.11

For illustrative purposes, we use the humeral-measurement data from Bartholdy et al. [18]. The dataset contains measurements of maximum length, head diameter, and epicondylar breadth from the humeri of 36 males and 48 females. The dataset is small and the population does not reflect one that would be relevant for any modern forensic case, but it is a convenient dataset that will suffice to illustrate some statistical concepts. For univariate models we use the head-diameter measurements, and for bivariate models we use both head-diameter and epicondylar-breadth measurements.

The introductory literature on the likelihood-ratio framework tends to focus on what is often called “source attribution” or “individualization”, e.g., situations in which a legal-decision maker wants to decide whether the bone in question comes from a particular individual or from some other individual randomly selected from a specified relevant population. Here, we focus on a simpler “classification” problem with only two mutually-exclusive classes, e.g., a situation in which a legal-decision maker's task is to decide whether the skeletal element in question comes from a male or from a female from the specified relevant population.

3. Calculating a likelihood ratio using linear discriminant analysis

Traditionally in forensic anthropology, linear discriminant analysis is used to calculate a posterior probability to which a threshold is then applied to make a classification. When first developed, without the aid of modern computers, calculations for linear discriminant analysis were laborious. Linear discriminant functions were therefore used ([43], [44]). For a two-class problem, multivariate data could be transformed to values on a univariate linear discriminant function, and, assuming equal priors, each test datum could then be classified according to whether it was closer to the centroid of one class or the other. A higher prior probability for one class, and concomitantly lower prior probability for the other, would shift the threshold on the linear discriminant function further from the centroid of the first class and closer to the centroid of the second class. The calculation of the linear discriminant function was laborious, but thereafter classifying test data was easy as it did not require calculating the exact posterior probability for each new datum.

Using modern computers, the calculation of posterior probabilities (or of likelihoods) based on Gaussian distributions is trivial: all that is required is to enter training data into functions that calculate mean vectors and covariance matrices, then enter those statistics and the test data into functions that calculate probability densities. These functions are easily accessible in many programming languages and software packages. A posterior probability can be calculated as in Eq. (1), in which: HM is the hypothesis that the humerus comes from a male in the relevant population; HF is the hypothesis that the humerus comes from a female in the relevant population; p(HM|xQ) is the posterior probability that the “male” hypothesis HM is true given the measurement vector xQ from the bone in question; f(x|μ,Σ) is the probability density (the likelihood) of a Gaussian model with mean vector μ and covariance matrix Σ evaluated at vector x; μM and μF are mean vectors calculated using a sample of data known to come from males in the relevant population and a sample of data known to come from females in the relevant population respectively; Σ is a covariance matrix calculated using data pooled from both the male and female samples12; p(HM) is the prior probability that the “male” hypothesis is true; and p(HF) is the prior probability that the “female” hypothesis HF is true.

p(HM|xQ)=f(xQ|μM,Σ)p(HM)f(xQ|μM,Σ)p(HM)+f(xQ|μF,Σ)p(HF) (1)

Since HM and HF are mutually exclusive and exhaustive, p(HF)=1p(HM) and p(HF|xQ)=1p(HM|xQ), and Eq. (1) can be rearranged to obtain Eq. (2), which is a version of the odds-form of Bayes’ Theorem.

p(HM|xQ)p(HF|xQ)=f(xQ|μM,Σ)f(xQ|μF,Σ)×p(HM)p(HF) (2)
posteriorodds=likelihoodratio×priorodds

In the odds-form of Bayes’ Theorem:

  • the prior odds represent the legal-decision maker's belief as to the relative probability that the “male” hypothesis is true versus that the “female” hypothesis is true before they consider the forensic practitioner's statement of the strength of the evidence;

  • the likelihood ratio is the forensic practitioner's statement of the strength of the evidence;

  • and the posterior odds represent the legal-decision maker's belief as to the relative probability that the “male” hypothesis is true versus that the “female” hypothesis is true after they have considered the forensic practitioner's statement of the strength of the evidence.

The likelihood ratio therefore quantifies the amount by which, in light of the evidence, the legal-decision maker updates their belief with respect to the relative probabilities of the “male” and the “female” hypotheses. This assumes that the legal-decision maker is applying Bayes’ Theorem and using the likelihood ratio provided by the forensic practitioner. These assumptions are adopted in order to explain the meaning of a likelihood ratio, not to describe how a legal-decision maker actually acts or to advise how a legal-decision maker should act.

For the likelihood-ratio value to be meaningful, one must also be satisfied that the data used for training the statistical models (e.g., the data used for calculating the mean vectors and the covariance matrix) are reasonably representative of the relevant population for the case.

The prior odds could be based on an estimate of the ratio of males to females in the relevant population, but could also depend on other evidence already presented in the case that has influenced the legal-decision maker's belief with respect to the relative probabilities of the two hypotheses.

In the likelihood-ratio framework, the task of the forensic practitioner is to assess and present the value of the likelihood ratio. The likelihood-ratio value can, in theory, be any number in the range 0 to +∞ (the log-likelihood-ratio value can be any number in the range –∞ to +∞). The larger the number the greater the support it gives for the hypothesis in the numerator of the likelihood ratio (in this example, HM), and the smaller the number the greater the support it gives for the hypothesis in the denominator of the likelihood ratio (in this example, HF). If the likelihood-ratio value is 1 (the log-likelihood-ratio value is 0), it gives equal support for both hypotheses, and the posterior odds will be the same as the prior odds.

Assuming equal priors, p(HM)=p(HF), hence prior odds p(HM)/p(HF)=1, a “zone of uncertainty” based on posterior probability for male between 0.05 and 0.95 would correspond to likelihood-ratio values in the range 0.05/0.95 to 0.95/0.05 (=1/19 to 19). Unlike an approach which does not draw any inference about the sex of bones with posterior probabilities within this “zone of uncertainty”, likelihood ratios provide meaningful information both outside and within this range, and they do not suffer from a “fall-off-the-cliff” effect. Likelihood-ratio values of 2, 10, or 1/15, for example, provide information that a legal-decision maker could logically use to update their beliefs, and likelihood-ratio values of 18.9 and 19.1 will not be presented to legal-decision makers as if they had very different meanings.

Eq. (3) shows a univariate example of the calculation of a likelihood ratio Λ(x), and Eq. (4) show a bivariate example of the calculation of a likelihood ratio Λ(x). Fig. 2(a) shows a graphical representation of Eq. (3) in which the likelihood ratio for measurement scalar x is the height of the “male” curve relative to the height of the “female” curve, and Fig. 3(a) shows a graphical representation of Eq. (4) in which the likelihood ratio for measurement vector x is the height of the “male” surface relative to the height of the “female” surface. The values inserted into the equations and used to plot the figures are taken from the Bartholdy et al. [18] dataset. One measurement (from a male) was selected and used as x=xQ in the univariate case and x=xQ in the bivariate case (hereinafter we drop the Q subscript), and the remainder of the data were used to calculate the values for μM, μF, σ, μM, μF, and Σ.

Λx=f(x|μM,σ)f(x|μF,σ)=f(x=46.5|μM=49.4,σ=2.50)f(x=46.5|μF=41.6,σ=2.50)=0.07900.0245=3.23 (3)
Λx=f(x|μM,Σ)f(x|μF,Σ)=f(x1x2|μM,1μM,2,σ1,1σ1,2σ2,1σ2,2)f(x1x2|μF,1μF,2,σ1,1σ1,2σ2,1σ2,2)=f(46.559.0|49.463.9,6.264.844.8415.8)f(46.559.0|41.655.3,6.264.844.8415.8)=0.006870.00282=2.44 (4)

Fig. 2.

Fig. 2

Example (based on humeral-head-diameter data from [18]) of calculation of likelihood ratio using a univariate linear discriminant analysis model. (a): Calculation based on probability-density functions. (b): Calculation based on a linear equation.

Fig. 3.

Fig. 3

Example (based on humeral head-diameter, HD, and epicondylar-breadth, EB, data from [18]) of calculation of likelihood ratio using a bivariate linear discriminant analysis model. (a): Calculation based on probability-density functions. (b): Calculation based on a linear equation.

Table 1 collects the example likelihood-ratio values calculated using the same measurement vector x and all the different models presented in the present paper.

Table 1.

Example likelihood-ratio values calculated using the same example measurement vector and different univariate and bivariate models.

Univariate (head diameter) Bivariate (head diameter, epicondylar breadth)
Linear discriminant analysis 3.23 2.44
Logistic regression 2.26 1.91
Quadratic discriminant analysis 4.22 2.64

Before leaving linear discriminant analysis and moving on to logistic regression, in Eqs. (5), (6), (7) we show the derivation of the linear equation for the calculation of a likelihood ratio using linear discriminant analysis. For simplicity, we only show the derivation of the univariate equation: y=a+bx, in which y is the natural logarithm of the likelihood ratio, a is the intercept, b is the slope, and x is the head-diameter measurement made on the humerus.

y=lnΛx=lnf(x|μM,σ)f(x|μF,σ)=ln1σ2πexμM22σ21σ2πexμF22σ2=lnexμM2xμF22σ2=x2+μM22xμMx2μF2+2xμF2σ2=μM2+2xμM+μF22xμF2σ2=μM2+μF22σ2+μMμFσ2x=a+bx (5)
b=μMμFσ2 (6)
a=μM2+μF22σ2=bμM+μF2 (7)

In Eqs. (8), (9)), we show a univariate example of the calculation of a likelihood ratio Λ(x) given the same values as previously used in Eq. (3). Note that the final result in Eq. (9) is the same as the final result in Eq. (3). The same example is graphically represented in Fig. 2(b). Note that the straight line in Fig. 2(b) could be constructed by sweeping a probe along the x axis of Fig. 2(a) and at each point calculating the natural logarithm of the height of the “male” curve relative to the height of the “female” curve.

y=a+bx=μM2+μF22σ2+μMμFσ2x=49.42+41.622×2.502+49.441.62.502×46.5=56.8+1.25×46.5=1.17 (8)
Λ(x)=ey=e1.17=3.23 (9)

The bivariate example is graphically represented in Fig. 3(b). Note that the plane in Fig. 3(b) could be constructed by sweeping a probe around the x1-x2 plane of Fig. 3(a) and at each point calculating the natural logarithm of the height of the “male” surface relative to the height of the “female” surface. The multivariate equation in general would be: y=β0+β1x1+β2x2++βmxm, in which β0 is the intercept and β1,,βm are the slopes corresponding to the m dimensions of the data.

4. Calculating a likelihood ratio using logistic regression

Traditionally in forensic anthropology, logistic regression is used to calculate a posterior probability to which a threshold is then applied to make a classification. A posterior probability can be calculated as in Eq. (10), in which β0 is an intercept and β1,,βm are slopes corresponding to the m dimensions of the data. The values for β0,,βm are calculated using an iterative algorithm. We do not describe the details of fitting logistic-regression models here, the interested reader is referred to texts such as [45] and [46]. For our calculations, we used the Newton iterative fitting algorithm with conjugate gradient ascent.

p(HM|x)=11+e(β0+β1x1+β2x2++βmxm) (10)

Since HM and HF are mutually exclusive and exhaustive, p(HF|x)=1p(HM|x), and Eq. (10) can be rearranged to obtain Eq. (11). Eq. (11) gives the logged posterior odds, and this is the form in which the model is actually fitted.

ln(p(HM|x)p(HF|x))=β0+β1x1+β2x2++βmxm (11)

In order to use logistic regression to calculate a likelihood ratio, the data points in the training data should be weighted such that the two classes have the same weight; hence, p(HM)=p(HF), the prior odds p(HM)/p(HF)=1, and the posterior odds will equal the likelihood ratio (see Eq. (2)). Eqs. (12), (13), (14), (15) repeat the same examples as for linear discriminant analysis above but with the coefficients values (a and b, and β0, β1, β2) obtained using logistic regression. For parallelism with the linear equation derived for linear discriminant analysis in the previous section, the univariate example uses a and b for the intercept and slope. Note that the values for a and b in Eq. (12) are not the same as those obtained using linear discriminant analysis in Eq. (8). Fig. 4, Fig. 5 show a graphical representation of the calculation of the likelihood ratio for the univariate and bivariate examples. Compare Fig. 4, Fig. 5 with Fig. 2, Fig. 3 respectively. In these examples, the slopes obtained using logistic regression are all shallower then the slopes obtained using linear discriminant analysis.

ln(Λ(x))=y=a+bx=42.3+0.928×46.5=0.815 (12)
Λ(x)=ey=e0.815=2.26 (13)
ln(Λ(x))=y=β0+β1x1+β2x2=42.9+0.809×46.5+0.102×59.0=0.648 (14)
Λ(x)=ey=e0.648=1.91 (15)

Fig. 4.

Fig. 4

Example (based on humeral-head-diameter data from [18]) of calculation of likelihood ratio using a univariate logistic regression model. Compare Fig. 4(b) with Fig. 2(b).

Fig. 5.

Fig. 5

Example (based on humeral head-diameter, HD, and epicondylar-breadth, EB, data from [18]) of calculation of likelihood ratio using a bivariate logistic regression model. Compare Fig. 5(b) with Fig. 3(b).

Logistic regression is a discriminative model, not a generative model – it does not actually calculate the ratio of two likelihoods – but under ideal circumstances it would give the same results as linear discriminant analysis ([47] §4.4.5).13 Because of its analogy with linear discriminant analysis, a generative model which actually calculates the ratio of two likelihoods, the output of logistic regression can be interpreted as a log likelihood ratio. Because it is not dependent on the assumptions of Gaussian distributions with the same covariance matrix, logistic regression is more robust than linear discriminant analysis when the data deviate from those assumptions. If the assumptions are met and the sample size is small; however, linear discriminant analysis will be less prone to overfit the training data.

5. Calculating a likelihood ratio using quadratic discriminant analysis

Quadratic discriminant analysis is the same as linear discriminant analysis, except that (in the present context) instead of using a single covariance matrix Σ calculated using data pooled from male and female samples, it uses two separate covariance matrices. ΣM is calculated using data sampled from males and ΣF is calculated using data sampled from females. Eq. (16) gives the quadratic-discriminant-analysis version of the odds-form of Bayes’ Theorem, cf. Eq. (2).

p(HM|xQ)p(HF|xQ)=f(xQ|μM,ΣM)f(xQ|μF,ΣF)×p(HM)p(HF) (16)
posteriorodds=likelihoodratio×priorodds

Fig. 6 and Eq. (17) show the univariate example of the calculation of a likelihood ratio, and Fig. 7 and Eq. 18 show the bivariate example. Note that in Fig. 6, Fig. 7 the mapping functions between x and ln(Λ(x)) and between x and ln(Λ(x)) are not linear, they are a curve and a curved surface respectively.

Λx=f(x|μM,σM)f(x|μF,σF)=f(x=46.5|μM=49.4,σM=2.78)f(x=46.5|μF=41.6,σF=2.31)=0.08130.0192=4.22 (17)
Λx=f(xQ|μM,ΣM)f(xQ|μF,ΣF)=f(x1x2|μM,1μM,2,σM,1,1σM,1,2σM,2,1σM,2,2)f(x1x2|μF,1μF,2,σF,1,1σF,1,2σF,2,1σF,2,2)=f(46.559.0|49.463.9,7.716.936.9323.2)f(46.559.0|41.655.3,5.353.433.4310.7)=0.006800.00258=2.64 (18)

Fig. 6.

Fig. 6

Example (based on humeral-head-diameter data from [18]) of calculation of likelihood ratio using a univariate quadratic discriminant analysis model.

Fig. 7.

Fig. 7

Example (based on humeral head-diameter, HD, and epicondylar-breadth, EB, data from [18]) of calculation of likelihood ratio using a bivariate quadratic discriminant analysis model.

6. Validation of likelihood-ratio models

The performance of a model is assessed by:

  • 1.

    Taking data that represent the relevant population for the case, that reflect conditions of the case, and for which the true class of each datum is known (e.g., each measurement vector is made on a humerus known to be from a male or know to be from a female from the population of interest);

  • 2.

    Inputting each measurement vector into the model;

  • 3.

    Then comparing the output of the model in response to each input with the known truth about the class of the corresponding input.

The test data must be separate from the data used to train the model, otherwise the results will be overly optimistic with respect to how well the model will perform when applied to previously unseen data, e.g., the measurements made on the humerus of questioned biological sex in the case.

Typically in the forensic-anthropology literature, the results are summarized using correct-classification rate, i.e., the proportion of all inputs that were correctly classified.14 In the examples used in the present paper, the class of each input is either “male” or “female”. In a classification framework, the class of each output would be either “male” or “female”. If there is an imbalance in the number of “male” inputs and the number of “female” inputs in the validation data, the correct-classification rate can be separately calculated for each input class, then the mean over both classes calculated.

An alternative to correct-classification rate is classification-error rate, which is the proportion of inputs that were misclassified. This is equivalent to one minus the correct-classification rate.15 The classification-error rate, Eclass, with equal weighting for each class can be calculated as in Eq. (19), in which NM and NF are the number of inputs in the validation data known to be from males and the number of inputs in the validation data known to be from to be from females respectively, and YM and YF are classification outputs from the model in response to inputs known to be from males and inputs known to be from females respectively. In Eq. (19), a cost of 0 is assigned for a correct classification and a cost of 1 for an incorrect classification, the mean cost is calculated for inputs known to be from males and separately the mean cost is calculated for inputs known to be from females, then the mean of the latter two means is calculated. Eclass is an average cost calculated over all the test data.

Eclass=121NMiNM0ifYMi=M1ifYMi=F+1NFjNF0ifYFj=F1ifYFj=M (19)

Eclass is a number between 0 and 1 inclusive. Lower Eclass values indicate better performance, i.e., fewer misclassifications. The expected Eclass value for a model whose output was random would be 0.5. A model with an Eclass value greater then 0.5 would be performing worse than chance.

In the likelihood-ratio framework, the output of the model is not a classification but a continuously-valued likelihood-ratio value. In our examples, which have HM in the numerator and HF in the denominator, the higher the likelihood-ratio value the greater the support for HM relative to HF and the lower the likelihood-ratio value the greater the support for HF relative to HM. If the input is from a male, the higher the likelihood-ratio value the greater the support for the correct hypothesis relative to the incorrect hypothesis. Mutatis mutandis, if the input is from a female, the lower the likelihood-ratio value the greater the support for the correct hypothesis relative to the incorrect hypothesis. Therefore, in order to assess the performance of a model that outputs likelihood ratios, we should not assign a cost of 0 or 1 based on classification, but rather a cost based on how good or how bad each likelihood-ratio values is:

  • If we know the input was from a male we should assign a small cost value for a very large likelihood-ratio value, a larger cost value for a smaller likelihood-ratio value, and a much larger cost value for a very small likelihood-ratio value.

  • Mutatis mutandis, if we know the input was from a female we should assign a small cost value for a very small likelihood-ratio value, a larger cost value for a larger likelihood-ratio value, and a much larger cost value for a very large likelihood-ratio value.

A commonly used metric in the forensic-inference-and-statistics literature (and especially in the forensic-voice-comparison literature [21]) is the log-likelihood-ratio cost, Cllr [48], see Eq. (20), in which ΛM and ΛF are likelihood-ratio outputs from the model in response to inputs known to be from males and inputs known to be from females respectively. The functions within the leftmost summation and rightmost summation of Eq. (20) are plotted in Fig. 8.

Cllr=121NMiNMlog21+1ΛMi+1NFjNFlog21+ΛFj (20)

Fig. 8.

Fig. 8

Cost functions within the leftmost summation and rightmost summation of Eq. (20).

Cllr is a number between 0 and +∞. Lower Cllr values indicate better performance. A model that always output a likelihood ratio of 1 irrespective of the input would give no useful information: the posterior odds would always be the same as the prior odds. A model that gave no useful information would have a Cllr value of 1. Models that are miscalibrated can output likelihood ratios substantially larger than 1, but their performance can be improved by calibrating the system (see [49] for an introduction to this topic). Well calibrated systems will have Cllr values in the range 0 to ∼1.

Returning to our univariate and bivariate examples, we validate the previously described models using leave-one-out cross validation, in which one measurement vector is held out, the remainder of the vectors are used to train the model, and the likelihood-ratio value is then calculated for the held-out vector. This is then repeated holding out each measurement vector in turn. This makes best use of the limited amount of data available while still avoiding training and testing on the same data. The resulting Cllr values are given in Table 2.

Table 2.

values for different likelihood-ratio models applied to data from [18].

Univariate (head diameter) Bivariate (head diameter, epicondylar breadth)
Linear discriminant analysis 0.300 0.341
Logistic regression 0.306 0.349
Quadratic discriminant analysis 0.321 0.339

Based on the Cllr values in Table 2, the univariate models performed better than the bivariate models.16 The simpler univariate linear models (linear discriminant analysis and logistic regression) also performed a little better than the more complex univariate quadratic discriminant analysis.

A graphical representation of likelihood-ratio validation results commonly used in the forensic-inference-and-statistics literature (and especially in the forensic-voice-comparison literature [21]) is a Tippett plot [50]. Tippett plots for the previously described likelihood-ratio models are given in Fig. 9. The likelihood-ratio value corresponding to each measurement vector is plotted as a dot, and straight lines are drawn between adjacent dots. In our examples, a Tippett plot displays the empirical cumulative distribution of all the likelihood-ratio values resulting from test data known to be from males, and the empirical cumulative distribution of all the likelihood-ratio values resulting from test data known to be from females. The empirical cumulative distributions are plotted so that for the curve rising to the right the value on the y axis is the proportion of male inputs resulting in likelihood-ratio values equal to or less than the value on the x axis, and for the curve rising to the left the value on the y axis is the proportion of female inputs resulting in likelihood-ratio values equal to or greater than the value on the x axis.

Fig. 9.

Fig. 9

Tippett plots for different likelihood-ratio models applied to data from [18]. (a) Univariate linear discriminant analysis. (b) Bivariate linear discriminant analysis. (c) Univariate logistic regression. (d) Bivariate logistic regression. (e) Univariate quadratic discriminant analysis. (f) Bivariate quadratic discriminant analysis. In each panel, the dot in the middle of the circle corresponds to the result from the example feature vector.

In general, the better the performance of the system that generated the likelihood-ratio results, the greater the separation between the “male” and “female” curves on the Tippett plots, and, concomitantly, the shallower the slopes of the curves. Given this, the results from quadratic discriminant analysis (shown in the bottom panels of Fig. 9) may appear to be better than the results from linear models (linear discriminant analysis and logistic regression shown in the top and middle panels), but the results from quadratic discriminant analysis also include some large-magnitude positive log-likelihood-ratio values for bones known to be from females. The results from the bivariate models (shown in the panels on the right) also include some large-magnitude positive log-likelihood-ratio values for bones known to be from females, and, in addition, some large-magnitude negative log-likelihood-ratio values for bones known to be from males. The extent of these likelihood-ratio results supporting contrary-to-fact hypotheses is less for the univariate linear models: univariate linear discriminant analysis and univariate logistic regression (shown in panels (a) and (c)). As already indicated by the Cllr values, the best results were obtained for the univariate linear models.

All models provide useful information, Cllr is substantially less than 1, and appear to give reasonably well-calibrated output – the curves in the Tippett plots cross relatively close to ln(LR) = 0. For more complex models in which larger numbers of parameter values need to be estimated, it is usually necessary to calibrate their output using an explicit calibration model, see [51], [21], and [52].

Some of the models output likelihood-ratio values into the tens of thousands and even into the millions. These numbers are difficult to justify given the small sample sizes. To avoid complicating the present paper we do not address this issue here, but direct the interested reader to some solutions explored in [53].

Considering both Cllr and Tippett plots and the discussion above, given the Bartholdy et al. [18] dataset, the univariate logistic regression model appears to have resulted in the best performance. Note that it did not give the “best” results for the example feature vector (it did not give the largest likelihood-ratio value for this male feature vector), but it gave the best results averaged over all feature vectors. Given the small dataset, its lack of relevance for any modern forensic context, and the fact that the epicondylar-breadth data violate the assumptions of all the models tested, one should not draw any generalizations from any of the particular results presented here.

For other descriptions of both Cllr and Tippett plots see [[54], [55], [56], [57]] and [21].

7. Conclusion

Use of the likelihood-ratio framework for evaluation of forensic evidence is advocated by many who work in the area of forensic inference and statistics, and in guidance documents issued by prominent organizations. So far, there has been little use of the likelihood-ratio framework in forensic anthropology, but, with respect to adoption of the likelihood-ratio framework, forensic anthropology has advantages over some other branches of forensic science: it is a branch of forensic science in which it is already common to draw inferences on the basis of relevant data, quantitative measurements, and statistical models. In the present paper, we explained how to calculate likelihood ratios using anthropometric data, and statistical models that are already commonly used in forensic anthropology: linear discriminant analysis, quadratic discriminant analysis, and logistic regression. We also explained how to empirically validate likelihood-ratio models. We hope that this will contribute to greater understanding and wider adoption of the likelihood-ratio framework in forensic-anthropology research and practice.

Disclaimer

All opinions expressed in the present paper are those of the authors, and, unless explicitly stated otherwise, should not be construed as representing the policies or positions of any organizations with which the authors are associated.

Author contributions

Geoffrey Stewart Morrison: Conceptualization, Writing – original draft, Writing – review & editing, Funding acquisition. Philip Weber: Investigation, Software, Visualization, Writing – review & editing. Nabanita Basu: Investigation, Software, Writing – review & editing. Roberto Puch-Solis: Software, Writing – review & editing. Patrick S. Randolph-Quinney: Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was supported by Research England's Expanding Excellence in England Fund as part of funding for the Aston Institute for Forensic Linguistics 2019–2022.

1

In the forensic-anthropology literature, the term “sectioning point” is often used rather than “threshold”.

2

Jerković et al. [17] claims that a 95% correct-classification rate “is the minimal level set by modern forensic and legal standards”. We traced the publications that Jerković et al. [17] cited in support of this claim and the publications cited in those publications, but could find no support for the claim that this is a legal requirement. Nor could we find any evidence that it is a requirement of any standard on forensic science developed by a national or international standards-development organization.

3

Note that if the data used for training and testing the statistical models were sampled from the same populations and the population distributions conformed to the assumptions of the models, then the expected value of the correct-classification rate would be predictable from the posterior-probability threshold and vice versa. If the posterior-probability threshold were τ, i.e., only data with posterior probabilities for male p(HM)τ and p(HM)(1τ), or equivalently p(HF)τ, were used for calculating the correct-classification rate κ, then the expected correct-classification rate would be κ=(p(p(HM)τ)+p(p(HF)τ))/2=(τ+τ)/2=τ, e.g., if τ=0.95 then the expected value for κ=0.95. Despite the difference in name, “posterior-probability threshold” versus “correct-classification rate”, τ and κ represent the same underlying concept, with τ being the predicted value and κ being the empirically derived value. In practice τ and κ would usually differ because of violations of model assumptions, model overfitting, and/or sampling variability. With respect to sampling variability, keeping τ fixed but changing the sample used for training or the sample used for testing would usually result in a different value for κ. Separate sets of training and test data are used to assess the extent to which the model is useful.

10

The likelihood-ratio framework for evaluation of forensic evidence should not be confused with likelihood-ratio tests used to assess difference in goodness of fit between competing models. Konigsberg et al. [42], for example, makes use of likelihood-ratio tests. Other references to likelihood ratios in that paper, e.g., “Taken as an evidentiary problem and assuming equal priors for male as for female within the population at large, the LR from the quadratic discriminant function is 1.997. This is found by calculating the [multivariate normal] density for obtaining ‘Mr. Johnson's’ measurements from the males and from the females …, averaging these densities across the two sexes, and dividing the male density by this average.” (p. 80), are not likelihood ratios as understood in the likelihood-ratio framework. As defined in the quote, they are twice the posterior probability. The definition in the quote is equivalent to our Eq. (1) multiplied by two and assuming equal priors. The likelihood ratio corresponding to the value stated in the quote would actually be 666.

11

Discriminant analysis assumes that the data from each class have Gaussian distributions, and linear discriminant analysis further assumes that the distributions from all classes have the same variance (in the univariate case) or the same covariance matrix (in the multivariate case). Histogram plots of the Bartholdy et al. [18] data reveal that these assumptions do not hold for epicondylar-breadth measurements: the female data appear to have a positive skew and the male data appear to be bimodal. Logistic regression is more robust to violations of these assumptions, but will not be robust to the bimodal distribution of the male data. Exploratory analysis of the data therefore suggest that none of linear discriminant analysis, quadratic discriminant analysis, or logistic regression are appropriate. Our purpose here, however, is simply to illustrate how to use these models, that are common in forensic anthropology, to calculate likelihood ratios. Whether these are good models to apply to these data, how well they perform when applied to these data, and the likelihood-ratio values that they output when applied to these data are not actually of concern. Use of linear discriminant analysis and logistic regression in the present paper also allows for direct comparison with their use in Bartholdy et al. [18] with the same dataset.

12

We used the formula for the unbiased estimate of the covariance matrix, i.e., dividing by 1n rather then by n (where n is the number of data point used to calculate the covariance matrix). We gave equal weight to each data point, i.e., we subtracted the class mean from the data in each class, pooled the data, and then calculated the covariance matrix.

13

A generative model is a model that estimates a probability density for the measurements.

14

In the forensic-anthropology literature, correct-classification rate is usually expressed as a percentage. In the present paper, we express it as a proportion.

15

If classification-error rate and correct-classification rate are expressed as a percentages, the classification-error rate is 100 minus the correct-classification rate.

16

As mentioned in note 11, the epicondylar-breadth data violates the assumptions of all the models tested. Epicondylar breadth and head diameter were also highly correlated (Pearson's linear correlation coefficient ρ=0.794). There may have been little additional useful information that the bivariate models could have exploited compared to their univariate counterparts, especially given the sampling variability associated with the small sample sizes. Univariate models based on epicondylar breadth had Cllr values in the range 0.5–0.6.

References

  • 1.Stewart T.D. Charles C. Thomas; Springfield, IL: 1979. Essentials of Forensic Anthropology. [Google Scholar]
  • 2.Randolph-Quinney P.S., Mallett X., Black S.M. In: Jamieson A., Moenssens A., editors. Wiley; Chichester, UK: 2009. Anthropology; pp. 152–178. (Wiley Encylopedia of Forensic Science). [Google Scholar]
  • 3.Dirkmaat D.C. In: A Companion to Forensic Anthropology. Dirkmaat D.C., editor. Wiley Blackwell; Oxford, UK: 2012. L.L Cabo, Forensic anthropology: embracing the new paradigm; pp. 3–40. [Google Scholar]
  • 4.Obertová Z., Stewart A., Cattaneo C., editors. Statistics and Probability in Forensic Anthropology. Elsevier; London, UK: 2020. [DOI] [Google Scholar]
  • 5.Obertová Z., Stewart A. In: Statistics and Probability in Forensic Anthropology. Obertová Z., Stewart A., Cattaneo C., editors. Elsevier; London, UK: 2020. Probability distributions, hypothesis testing, and analysis; pp. 73–86. [DOI] [Google Scholar]
  • 6.Nikita E., Gracía-Donad J.G., Nikitas P., Obertová Z., Kranioti E.F. In: Statistics and Probability in Forensic Anthropology. Obertová Z., Stewart A., Cattaneo C., editors. Elsevier; London, UK: 2020. Sex estimation using nonmetric variables: application of R functions; pp. 139–154. [DOI] [Google Scholar]
  • 7.Galeta P., Brůžek J. In: Statistics and Probability in Forensic Anthropology. Obertová Z., Stewart A., Cattaneo C., editors. Elsevier; London, UK: 2020. Sex estimation using continuous variables: problems and principles of sex classification in the zone of uncertainty; pp. 155–182. [DOI] [Google Scholar]
  • 8.Pons J. The sexual diagnosis of isolated bones of the skeleton. Hum. Biol. 1955;27:12–21. [PubMed] [Google Scholar]
  • 9.Giles E., Elliot O. Sex determination by discriminant function analysis of crania. Am. J. Phys. Anthropol. 1963;21:53–68. doi: 10.1002/ajpa.1330210108. [DOI] [PubMed] [Google Scholar]
  • 10.Saunders S.R., Hoppa R.D. Sex allocation from long bone measurements using logistic regression. J. Can. Soc. Forensic. Sci. 1997;30(2):49–60. doi: 10.1080/00085030.1997.10757086. [DOI] [Google Scholar]
  • 11.Ekizoglu O., Inci E., Palabiyik F.B., Can I.O., Er A., Bozdag M., Kacmaz I.E., Kranioti E.F. Sex estimation in a contemporary Turkish population based on CT scans of the calcaneus. Forensic Sci. Int. 2017;279 doi: 10.1016/j.forsciint.2017.07.038. 310e1–310e6. [DOI] [PubMed] [Google Scholar]
  • 12.Nuzzolese E., Randolph-Quinney P., Randolph-Quinney J., Di Vella G. Geometric morphometric analysis of sexual dimorphism in the mandible from panoramic X-ray images. J. Forensic Odonto-Stomatology. 2019;37(2):35–44. [PMC free article] [PubMed] [Google Scholar]
  • 13.Bidmos M.A., Adebesin A.A., Mazengenya P., Olateju O.I., Adegboy O. Estimation of sex from metatarsals using discriminant function and logistic regression analyses. Aust. J. Forensic Sci. 2021;53:543–556. doi: 10.1080/00450618.2019.1711180. [DOI] [Google Scholar]
  • 14.Murail P., Brůžek J., Braga J. A new approach to sexual diagnosis in past populations. practical adjustments from van Vark's procedure. Int. J. Osteoarchaeol. 1999;9:39–53. doi: 10.1002/(SICI)1099-1212(199901/02)9:1<39::AID-OA458>3.0.CO;2-V. [DOI] [Google Scholar]
  • 15.Brůžek J., Santos F., Dutailly B., Murail P., Cunha E. Validation and reliability of the sex estimation of the human os coxae using freely available DSP2 software for bioarchaeology and forensic anthropology. Am. J. Phys. Anthropol. 2017;164:440–449. doi: 10.1002/ajpa.23282. [DOI] [PubMed] [Google Scholar]
  • 16.Hora M., Sládek V. Population specificity of sex estimation from vertebrae. Forensic Sci. Int. 2018;291:279. doi: 10.1016/j.forsciint.2018.08.015. e1–279.e12. [DOI] [PubMed] [Google Scholar]
  • 17.Jerković I., Bašić Ž., Anđelinović Š., Kružić I. Adjusting posterior probabilities to meet predefined accuracy criteria: a proposal for a novel approach to osteometric sex estimation. Forensic Sci. Int. 2020;311 doi: 10.1016/j.forsciint.2020.110273. article 110273. [DOI] [PubMed] [Google Scholar]
  • 18.Bartholdy B.P., Sandoval E., Hoogland M.L.P., Schrader S.A. Getting rid of dichotomous sex estimations: why logistic regression should be preferred over discriminant function analysis. J. Forensic Sci. 2020;65:1685–1691. doi: 10.1111/1556-4029.14482. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Aitken C.G.G., Berger C.E.H., Buckleton J.S., Champod C., Curran J.M., Dawid A.P., Evett I.W., Gill P., González-Rodríguez J., Jackson G., Kloosterman A., Lovelock T., Lucy D., Margot P., McKenna L., Meuwly D., Neumann C., Nic Daéid N., Nordgaard A., Puch-Solis R., Rasmusson B., Redmayne M., Roberts P., Robertson B., Roux C., Sjerps M.J., Taroni F., Tjin-A-Tsoi T., Vignaux G.A., Willis S.M., Zadora G. Expressing evaluative opinions: a position statement. Sci. Justice. 2011;51:1–2. doi: 10.1016/j.scijus.2011.01.002. [DOI] [PubMed] [Google Scholar]
  • 20.Morrison G.S., Kaye D.H., Balding D.J., Taylor D., Dawid P., Aitken C.G.G., Gittelson S., Zadora G., Robertson B., Willis S.M., Pope S., Neil M., Martire K.A., Hepler A., Gill R.D., Jamieson A., de Zoete J., Ostrum R.B., Caliebe A. A comment on the PCAST report: skip the “match”/“non-match” stage. Forensic Sci. Int. 2017;272 doi: 10.1016/j.forsciint.2016.10.018. e7–e9. [DOI] [PubMed] [Google Scholar]
  • 21.Morrison G.S., Enzinger E., Hughes V., Jessen M., Meuwly D., Neumann C., Planting S., Thompson W.C., van der Vloed D., Ypma R.J.F., Zhang C., Anonymous A., Anonymous B. Consensus on validation of forensic voice comparison. Sci. Justice. 2021;61:229–309. doi: 10.1016/j.scijus.2021.02.002. [DOI] [PubMed] [Google Scholar]
  • 22.Association of Forensic Science Providers Standards for the formulation of evaluative forensic science expert opinion. Sci. Justice. 2009;49:161–164. doi: 10.1016/j.scijus.2009.07.004. [DOI] [PubMed] [Google Scholar]
  • 23.Aitken C.G.G., Roberts P., Jackson G. Royal Statistical Society; London, UK: 2010. Fundamentals of Probability and Statistical Evidence in Criminal Proceedings: Guidance for Judges, Lawyers, Forensic Scientists and Expert Witnesses.https://rss.org.uk/news-publication/publications/law-guides/ [Google Scholar]
  • 24.Willis S.M., McKenna L., McDermott S., O'Donell G., Barrett A., Rasmusson A., Nordgaard A., Berger C.E.H., Sjerps M.J., Lucena-Molina J.J., Zadora G., Aitken C.G.G., Lunt L., Champod C., Biedermann A., Hicks T.N., Taroni F. European Network of Forensic Science Institutes; 2015. ENFSI Guideline for Evaluative Reporting in Forensic Science.http://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf [Google Scholar]
  • 25.Ballantyne K., Bunford J., Found B., Neville D., Taylor D., Wevers G., Catoggio D. National Institute of Forensic Science of the Australia New Zealand Policing Advisory Agency; 2017. An Introductory Guide to Evaluative Reporting.http://www.anzpaa.org.au/forensic-science/our-work/projects/evaluative-reporting [Google Scholar]
  • 26.Kafadar K., Stern H., Cuellar M., Curran J., Lancaster M., Neumann C., Saunders C., Weir B., Zabell S. American Statistical Association; 2019. American Statistical Association Position on Statistical Statements for Forensic Evidence.https://www.amstat.org/asa/files/pdfs/POL-ForensicScience.pdf [Google Scholar]
  • 27.Forensic Science Regulator, Codes of Practice and Conduct: Development of Evaluative Opinions (FSR-C-118 Issue 1) Forensic Science Regulator Birmingham; UK: 2021. https://www.gov.uk/government/publications/development-of-evaluative-opinions [Google Scholar]
  • 28.Lucy D. Wiley; Chichester UK: 2005. Introduction to Statistics for Forensic Scientists. [Google Scholar]
  • 29.Zadora G., Agnieszka M., Ramos D., Aitken C.G.G. Wiley; Chichester, UK: 2014. Statistical Analysis in Forensic Science: Evidential Value of Multivariate Physicochemical Data. [DOI] [Google Scholar]
  • 30.Balding D.J., Steele C. second ed. Wiley; Chichester, UK: 2015. Weight-of-Evidence for Forensic DNA Profiles. [DOI] [Google Scholar]
  • 31.Adam C. Wiley; Chichester, UK: 2016. Forensic Evidence in Court: Evaluation and Scientific Opinion. [DOI] [Google Scholar]
  • 32.Buckleton J.S., Bright J.A., Taylor D., editors. Forensic DNA Evidence Interpretation. second ed. CRC; Boca Raton, FL: 2016. [Google Scholar]
  • 33.Robertson B., Vignaux G.A., Berger C.E.H. second ed. Wiley; Chichester, UK: 2016. Interpreting Evidence: Evaluating Forensic Science in the Courtroom. [DOI] [Google Scholar]
  • 34.Morrison G.S., Enzinger E., Zhang C. In: Expert Evidence. Freckelton I., Selby H., editors. Thomson Reuters; Sydney, Australia: 2018. Forensic speech science. ch. 99. [Google Scholar]
  • 35.Aitken C.G.G., Taroni F., Bozza S. third ed. Wiley; Chichester, UK: 2021. Statistics and the Evaluation of Evidence for Forensic Scientists. [DOI] [Google Scholar]
  • 36.de Boer H.H., Blau S., Delabarde T., Hackman L. The role of forensic anthropology in disaster victim identification (DVI): recent developments and future prospects. Forensic Science Research. 2019;4:303–315. doi: 10.1080/20961790.2018.1480460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.de Boer H.H., van Wijk M., Berger C.E.H. In: Statistics and Probability in Forensic Anthropology. Obertová Z., Stewart A., Cattaneo C., editors. Elsevier; London, UK: 2020. Communicating evidence with a focus on the use of Bayes' theorem; pp. 331–340. [DOI] [Google Scholar]
  • 38.Berger C.E.H., de Boer H.H., van Wijk M. In: Statistics and Probability in Forensic Anthropology. Obertová Z., Stewart A., Cattaneo C., editors. Elsevier; London, UK: 2020. Use of Bayes' Theorem in data analysis and interpretation; pp. 125–135. [DOI] [Google Scholar]
  • 39.Berger C.E.H., van Wijk M., de Boer H.H. In: Statistics and Probability in Forensic Anthropology. Obertová Z., Stewart A., Cattaneo C., editors. Elsevier; London, UK: 2020. Bayesian inference in personal identification; pp. 301–312. [DOI] [Google Scholar]
  • 40.Verma R., Krishan K., Rani D., Kumar A., Sharma V. Stature estimation in forensic examinations using regression analysis: a likelihood ratio perspective. Forensic Sci. Int.: Report. 2020;2 doi: 10.1016/j.fsir.2020.100069. article 100069. [DOI] [Google Scholar]
  • 41.Verma R., Krishan K., Rani D., Kumar A., Sharma V., Shreshtha R., Kanchan T. Estimation of sex in forensic examinations using logistic regression and likelihood ratios. Forensic Sci. Int.: Report. 2020;2 doi: 10.1016/j.fsir.2020.100118. article 100118. [DOI] [Google Scholar]
  • 42.Konigsberg L.W., Algee-Hewitt B.F., Steadman D.W. Estimation and evidence in forensic anthropology: sex and race. Am. J. Phys. Anthropol. 2009;139:77–90. doi: 10.1002/ajpa.20934. [DOI] [PubMed] [Google Scholar]
  • 43.Fisher R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 1936;7:179–188. doi: 10.1111/j.1469-1809.1936.tb02137.x. [DOI] [Google Scholar]
  • 44.Klecka W.R. Sage; Beverly Hills, CA: 1980. Discriminant Analysis. [Google Scholar]
  • 45.Menard S. Sage; Thousand Oaks, CA: 2010. Logistic Regression: from Introductory to Advanced Concepts and Applications. [DOI] [Google Scholar]
  • 46.Hosmer D.W., Jr., Lemeshow S., Sturdivant R.X. third ed. Wiley; Hoboken, NJ: 2013. Applied Logistic Regression. [DOI] [Google Scholar]
  • 47.Hastie T., Tibshirani R., Freidman J. second ed. Springer; New York: 2009. The Elements of Statistical Learning Data Mining, Inference, and Prediction. [DOI] [Google Scholar]
  • 48.Brümmer N., du Preez J. Application independent evaluation of speaker detection. Comput. Speech Lang. 2006;20:230–275. doi: 10.1016/j.csl.2005.08.001. [DOI] [Google Scholar]
  • 49.Morrison G.S. Tutorial on logistic-regression calibration and fusion: converting a score to a likelihood ratio. Aust. J. Forensic Sci. 2013;45:173–197. doi: 10.1080/00450618.2012.733025. [DOI] [Google Scholar]
  • 50.Meuwly D. Doctoral dissertation, University of Lausanne; 2001. Reconnaissance de locuteurs en sciences forensiques: l’apport d’une approche automatique. [Google Scholar]
  • 51.Morrison G.S., Enzinger E., Ramos D., González-Rodríguez J., Lozano-Díez A. In: Handbook of Forensic Statistics. Banks D.L., Kafadar K., Kaye D.H., Tackett M., editors. CRC; Boca Raton, FL: 2020. Statistical models in forensic voice comparison; pp. 451–497. [DOI] [Google Scholar]
  • 52.Morrison G.S. In the context of forensic casework, are there meaningful metrics of the degree of calibration? Forensic Sci. Int.: Synergy. 2021;3 doi: 10.1016/j.fsisyn.2021.100157. article 100157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Morrison G.S., Poh N. Avoiding overstating the strength of forensic evidence: shrunk likelihood ratios/Bayes factors. Sci. Justice. 2018;58:200–218. doi: 10.1016/j.scijus.2017.12.005. [DOI] [PubMed] [Google Scholar]
  • 54.González-Rodríguez J., Rose P., Ramos D., Toledano D.T., Ortega-García J. Emulating DNA: rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Trans. Audio Speech Lang. Process. 2007;15:2104–2115. doi: 10.1109/TASL.2007.902747. [DOI] [Google Scholar]
  • 55.Drygajlo A., Jessen M., Gfroerer S., Wagner I., Vermeulen J., Niemi T. European Network of Forensic Science Institutes; 2015. Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, Including Guidance on the Conduct of Proficiency Testing and Collaborative Exercises.http://enfsi.eu/wp-content/uploads/2016/09/guidelines_fasr_and_fsasr_0.pdf [Google Scholar]
  • 56.Morrison G.S., Enzinger E. Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) – Introduction. Speech Commun. 2016;85:119–126. doi: 10.1016/j.specom.2016.07.006. [DOI] [Google Scholar]
  • 57.Meuwly D., Ramos D., Haraksim R. A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. Forensic Sci. Int. 2017;276:142–153. doi: 10.1016/j.forsciint.2016.03.048. [DOI] [PubMed] [Google Scholar]

Articles from Forensic Science International: Synergy are provided here courtesy of Elsevier

RESOURCES