Abstract
In this paper we investigate various effects of inbreeding on the likelihood ratio (LR) in forensic kinship testing. The basic setup of such testing involves formulating two competing hypotheses, in the form of pedigrees, describing the relationship between the individuals. The likelihood of each hypothesis is computed given the available genetic data, and a conclusion is reached if the ratio of these exceeds some pre-determined threshold. An important aspect of this approach is that the hypotheses are usually not exhaustive: The true relationship may differ from both of the stated pedigrees. It is well known that this may introduce bias in the test results. Previous work has established formulas for the expected value and variance of the LR, given the two competing hypotheses and the true relationship. However, the proposed method only handles cases without inbreeding. In this paper we extend these results to all possible pairwise relationships. The key ingredient is formulating the hypotheses in terms of Jacquard coefficients instead of the more restricted Cotterman coefficients. While the latter describe the relatedness between outbred individuals, the more general Jacquard coefficients allow any level of inbreeding. Our approach also enables scrutiny of another frequently overlooked source of LR bias, namely background inbreeding. This ubiquitous phenomenon is usually ignored in forensic kinship computations, due to lack of adequate methods and software. By leveraging recent work on pedigrees with inbred founders, we show how background inbreeding can be modeled as a continuous variable, providing easy-to-interpret results in specific cases. For example, we show that if true siblings are subjected to a test for parent-offspring, moderate levels of background inbreeding are expected to inflate the LR by more than 50%.
Keywords: Kinship analysis, Inbred founders, IBD triangle, Jacquard coefficients, Likelihood ratios
Introduction
The conventional approach to forensic kinship testing includes formulating two hypotheses and calculating a likelihood ratio (LR) based on genetic data from genotyped individuals. Practice differs between countries and laboratories, but typically the LR or some version of it is included when the case is reported. The conclusion based on the LR may be flawed when the true pedigree connecting the individuals of interest differs from the pedigrees considered by the hypotheses. As an example, consider a standard paternity case, where the prosecution asserts that a certain man is the father of a child, while the defense claims that the man and the child are unrelated. The truth, on the other hand, may be that the man is the child’s uncle. A special case of incorrect hypotheses occurs when inbreeding is not accounted for. For example, if the alleged father is inbred, and this is ignored when formulating the hypotheses, this may significantly bias the LR. One aim of this paper is to investigate and quantify this effect.
Slooten and Egeland derived explicit equations for the expected value and variance of the LR [1]. They also extended this to cases where the true relationship differs from those stated in the hypotheses [2]. However, in both of these works only non-inbred individuals were considered. An important contribution of this paper is the extension of these results to general pairwise relationships. In particular, we show that exact expressions for the expected value and variance of the LR can be obtained also in cases with inbreeding. The expressions are in general more involved than in the non-inbred case, and not as easy to interpret. However, we derive interesting and practical results in important special cases.
A parametric approach to modeling background inbreeding in kinship testing was recently introduced [3], employing the concept of inbred founders [4]. To exemplify, consider a pair of paternal half siblings, whose father is assigned an inbreeding coefficient f. As f increases from 0 to 1, the relationship between the half siblings becomes genetically indistinguishable from that between parent and child. We extend the theoretical framework of [1, 2] to pedigrees with inbred founders. As a result, the impact of background inbreeding on the expectation and variance of the LR can be studied based on exact expressions. In cases where the amount of inbreeding is unknown, we can still provide guidance on the expected values for the LR. Our approach conveniently allows a continuous range of possible true alternatives rather than a discrete set of specific alternatives. To arrive at explicit results of practical interest, we restrict attention to pairwise relationships. Furthermore, as in the work of Slooten and Egeland, we ignore mutations, dropouts, and silent alleles and we assume Hardy-Weinberg Equilibrium (HWE). However, we explain how deviation from HWE can be modeled by the so called theta (𝜃) correction.
R scripts and functions used to obtain numerical results in this paper are gathered in a R library (see the “R implementation” section). Pedigree likelihoods and marker simulations are performed with the forrel package [3].
This paper is organized in the following manner: After establishing some terminology and notation we review the main results of [2] regarding the expected value and variance of the LR for non-inbred pairs of individuals. We then proceed to extend these results to general pairwise relationships, including relationships in pedigrees with background inbreeding. Several worked examples follow, including a simulation study comparing our formulas with real-life results. Finally, we discuss some consequences of this work and how it relates to other aspects of forensic genetics.
Definitions and notation
A central concept for measuring genetic relatedness is that of identity by descent (IBD). Two alleles are said to be IBD relative to a given pedigree if they are identical by state and originate from the same ancestral allele within the pedigree [5].
Coefficients of inbreeding and kinship
The coefficient of inbreeding f, introduced by Wright [6], is the probability that an individual is autozygous at a given autosomal locus, i.e., that the two homologous alleles are IBD. This is the same as the kinship coefficientφ between the parents of the same individual, defined as the probability that a random allele from the mother is IBD to a random allele from the father at the same locus.
Founders of a pedigree are conventionally assumed to be unrelated and non-inbred. Following [3] we relax the second assumption, allowing an arbitrary inbreeding coefficient f to be assigned to any founder individual. For a given pedigree with N founders, we denote the set of founder inbreeding coefficients by .
Background inbreeding in human populations is normally low, but may exceed 5% in certain cases [7, 8]. In forensic case work inbreeding is common, ranging from consanguineous marriages between cousins, f = 1/16 or lower, to incestuous relationships between siblings or parent-child, both with f = 1/4. In breeding applications values closer to 1 may occur.
Jacquard coefficients and likelihood of a pedigree
The kinship coefficient is a coarse measure of relatedness; for instance, it has the same value for a parent-child relationship as for full siblings. A more refined measure is given by the nine Jacquard coefficients [9] , also called the condensed identity coefficients. These are the expected relative frequencies of the
Jacquard states are depicted in Fig. 1. Alleles within each individual are unordered, and hence, several IBD configurations can correspond to the same Jacquard state. Furthermore, Δ is related to φ through
The likelihood of two individuals being related according to Δ, given their genotypes G = (g1,g2) at a marker may be expressed by conditioning on the Jacquard state:
1 |
The conditional probabilities P(G∣Ji) are listed in Table 1. These probabilities are found by direct calculations; for instance, P((aa,aa)∣J1) = pa since J1 dictates that all four alleles are IBD.
Table 1.
G | J1 | J2 | J3 | J4 | J5 | J6 | J7 | J8 | J9 |
---|---|---|---|---|---|---|---|---|---|
(aa,aa) | pa | ||||||||
(aa,bb) | 0 | papb | 0 | 0 | 0 | 0 | |||
(aa,ab) | 0 | 0 | papb | 0 | 0 | 0 | |||
(aa,bc) | 0 | 0 | 0 | 2papbpc | 0 | 0 | 0 | 0 | |
(ab,aa) | 0 | 0 | 0 | 0 | papb | 0 | |||
(bc,aa) | 0 | 0 | 0 | 0 | 0 | 2papbpc | 0 | 0 | |
(ab,ab) | 0 | 0 | 0 | 0 | 0 | 0 | 2papb | papb(pa + pb) | |
(ab,ac) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | papbpc | |
(ab,cd) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4papbpcpd |
The symbols a, b, c, and d represent different alleles, with population frequencies pa, pb, pc, and pd respectively
IBD coefficients and inbred founders
For two non-inbred individuals, the first six Jacquard coefficients are zero, and Δ9, Δ8, and Δ7 reduce to the IBD coefficients κ = (κ0,κ1,κ2) introduced by Cotterman [10]. They give the probabilities that, at a given autosomal locus, the individuals share zero-, one-, and two-allele IBD, respectively. Note that κ0 + κ1 + κ2 = 1, so κ can be represented in a two-dimensional triangle with axes κ0 and κ2. Thompson [11] showed that the IBD coefficients are restricted to . This gives rise to an inadmissible region for the parameters, in gray in Fig. 2.
Although the IBD coefficients are only defined for non-inbred individuals, other members of the pedigree can be inbred. For example, a pair of half siblings remain outbred even if their shared parent is inbred. However, this inbreeding will affect the relatedness coefficients. Table 2 lists the kinship and the IBD coefficients for some common relationships, as functions of the founder inbreeding. The effects are visualized in Fig. 2. In the half sibling example, the genetic relationship approaches that of parent-child, as the founder inbreeding increases towards 1. Similarly, the IBD coefficients of full siblings with inbred parents may fall anywhere in the lightly shaded region towards the point of monozygotic twins.
Table 2.
Relationship | φ | φ(f) | κ | κ(f) | |
---|---|---|---|---|---|
S | |||||
H | |||||
κ2(f) = 0 | |||||
U | |||||
κ2(f1,f2) = 0 | |||||
FC | |||||
κ2(f1,f2) = 0 |
Review of previous results
We next review the main results of [2] relevant for our work. In particular we restate the explicit formulas for the expectation and variance of the LR in the case of non-inbred individuals.
The likelihood ratio as a random variable
We consider a kinship test involving genetic data from two non-inbred individuals. Two hypotheses HP and HD about the relationship are to be compared using the LR. For our purposes, each hypothesis corresponds to a point in the IBD triangle, denoted by κP and κD respectively. However, the evidence may be generated from another pedigree, corresponding to a third point κT. We therefore have the following setup, comprising the competing hypotheses and the true relationship:
Reflecting standard practice, we will always use unrelatedness as the defense hypothesis, i.e., κD = (1,0,0). It should be noted, however, that this is not a theoretical requirement for the methods presented here.
The concept of the likelihood ratio as a random variable was discussed by Slooten and Egeland [1]. We review the basics here, presented in a slightly simpler notation sufficient for our purposes.
Denote by Ki, i = 0,1,2, the event that the individuals share exactly i alleles IBD. As shown in Fig. 1, K0, K1, and K2 are identical to the Jacquard states J9, J8, and J7 respectively. For fixed κP the likelihood ratio for a given pair of genotypes G = (g1,g2) can be written as
2 |
Note that the final transition was obtained by applying (1) in both the numerator and denominator. The probabilities P(G|Ki) are given in Table 1.
Now, viewing the genotypes as a random variable , we define the random variable . Note that the distribution of is completely determined by κT (assuming HWE), hence the distribution of is determined by κP and κT. If these parameters are clear from the context, we will suppress them in our notation; otherwise, we write . In the special case when HP equals the truth, i.e., κP = κT, we may simplify to .
Throughout, we assume the following condition to hold
3 |
In the present context, it means that all DNA profiles that can occur under HP, can also occur under HD. In our examples HD specifies unrelated individuals, and then (3) holds. The condition also holds for mutation models provided all elements of the mutation matrix are positive. We do not model mutations in the work presented here, as practical exact expression are then no longer available. However, the implementation allows for general mutation models. Without (3), likelihood ratios could be infinite, i.e., not defined.
Expected likelihood ratio
The expectation of may be found by summing over all possible genotypes G in the standard way:
4 |
where . An exact expression for when κP = κT was first derived in [1] and extended in [2] to apply when κP≠κT. For the latter situation it was shown that, for a single marker with L alleles,
5 |
where t denotes the vector transpose, and
6 |
Importantly, the expected value depends only on the number of alleles, not on the allele frequencies. Furthermore, the expectation is symmetric in κP and κT, so that
7 |
Variance of the likelihood ratio
To derive the variance of we apply the general formula . Since the last term follows from Eq. 5, all that remains is to find the first term. Some notation is needed:
Furthermore, supplementing the matrix A0 given in Eq. 6, we define matrices A1 and A2 by
8 |
9 |
It was shown in [2] that
hence, the complete variance expression becomes
10 |
Contrary to the expected LR, the variance of the LR depends on the allele frequencies.
Example: paternity testing
This example serves as an illustration of the above described expected LR and the corresponding hypotheses. Consider a paternity case, where a man is claimed to be the father of a child (HP). The truth is that a brother of the alleged father is the true father of the child. The hypotheses and the true relatedness are in terms of the IBD coefficients given as
11 |
Figure 3 illustrates the hypotheses in terms of pedigrees, and as points in the IBD triangle. Equation (5), with IBD coefficients as in Eq. 11, simplifies to
12 |
The variance of becomes
In the special case L = 2, and allele frequencies q and 1 − q, the variance expression reduces to
This expression is minimal when q = 0.5 and becomes infinitely large when q or 1 − q approaches 0. If no assumption is made for L, but all alleles are assumed equally frequent, the variance reduces to
13 |
Table 3 exemplifies these formulas for various realistic values of L, and compares the results with the corresponding values if HP was true.
Table 3.
Truth | κP | κT | E[LR] | L = 2 | L = 10 | L = 50 |
---|---|---|---|---|---|---|
PO | (0, 1, 0) | (0, 1, 0) | 1.250 (0.188) | 3.250 (1.686) | 13.250 (9.188) | |
U | (0, 1, 0) | 1.125 (0.234) | 2.125 (3.234) | 7.125 (48.230) |
The variances are computed assuming uniform allele frequencies. The bottom row (U) shows the values when the true pedigree is uncle-nephew, as analyzed in the main text. For comparison, the top row shows the corresponding numbers when HP is true
Likelihood ratio for general pairwise relationships
In this section we extend the results reviewed above to relationships between any pairs of individuals. In particular we now allow inbreeding. For this to work we must pass from the IBD coefficients to the full set of Jacquard coefficients. For details regarding derivations of the results (see the Appendix).
Expected likelihood ratio
We use the same setup for kinship testing as introduced previously, but in order to allow general inbreeding, we now formulate our hypotheses using Jacquard coefficients,
Note that the defense hypothesis still corresponds to unrelatedness. We are interested in the likelihood ratio comparing HP with HD when the genotypes are generated by a pedigree with the Jacquard coefficients ΔT. Equation (1) implies that
14 |
As shown in the Appendix, the expected LR is
15 |
where B9 is the symmetric 9 × 9 matrix given in Table 4, whose elements are , for 1 ≤ i,j ≤ 9. As opposed to the non-inbred case, we see that the expected value in general depends on the allele frequencies.
Table 4.
J1 | J2 | J3 | J4 | J5 | J6 | J7 | J8 | J9 | |
---|---|---|---|---|---|---|---|---|---|
J1 | L | L | L | 1 | |||||
J2 | L2 | L | L | L | L | L | 1 | 1 | |
J3 | L | L | 1 | L | 1 | ||||
J4 | L | 1 | 1 | 1 | 1 | 1 | |||
J5 | L | L | 1 | ||||||
J6 | L | 1 | 1 | 1 | |||||
J7 | 1 | ||||||||
J8 | 1 | ||||||||
J9 | 1 |
Each row represents Ji, a Jacquard state assumed by HP, while each column presents Jj, the true Jacquard state
Variance of the likelihood ratio
In the Appendix matrices are defined and it is shown that
16 |
From this we obtain the variance formula
17 |
Pairwise relationships with inbred founders
As previously explained, a set of inbreeding coefficients f can be assigned the founders of a pedigree to model background inbreeding. The Jacquard coefficients of any pair of pedigree members are then functions of f. It follows that the formulas for expectation and variance of involving such pedigrees remain as in Eqs. 15 and 17, except that the parameters ΔP and ΔT must be updated.
Specifically, let fP be a vector of founder inbreeding coefficients in the pedigree assumed by HP, and fT similarly in the true pedigree. The expectation and variance of in this situation are then given by
and
Note that the matrices Bi only depend on L and the allele frequencies, and therefore are unchanged by founder inbreeding.
Remark 1
It should be emphasized that the formulas (15) and (17) are needed only when at least one of the tested individuals are inbred in some of the involved pedigrees. If both are non-inbred, the simpler expressions (5) and (10) using IBD coefficients suffice. Importantly, this remains true if other members of the pedigree are inbred, as long as this does not lead to inbreeding in the tested individuals. In particular, founder inbreeding may be accounted for in Eqs. 5 and 10 simply by replacing κP and κT by κP(fP) and κT(fT) respectively.
Founder inbreeding and 𝜃 correction
The conventional approach to background relatedness in forensics is the so called 𝜃 correction [12]. In an inbred population, the composition of genotypes do not follow the Hardy-Weinberg principle, implying that the frequencies given in Table 1 no longer hold. The following approach compensates for this by adjusting the allele frequencies. Without loss of generality we can assume that alleles observed are sampled sequentially. The probability that allele i is sampled as the j th allele is given by the sampling formula
18 |
where and bj denotes the number of alleles of type i among the j − 1 previously sampled. Note that for pairwise cases, the likelihood can be written
19 |
where P(G∣Ji,𝜃) is calculated using Eq. 18. The matrices B1,...,B9 then change with 𝜃, modifying the expectation and variance of the LR. This emphasises a fundamental difference between founder inbreeding and 𝜃 correction: f modifies the relationship itself, while 𝜃 only impacts the genotype probabilities.
Example: 𝜃 correction and founder inbreeding in a paternity case
This example compares 𝜃 correction to founder inbreeding. Consider first the hypothesis HD: A and B are unrelated. Assume both individuals are homozygous a/a. Equation (18) gives the likelihood
If rather than using 𝜃 correction, we assign an inbreeding coefficient f to A, the likelihood becomes
Consider next the hypothesis HP1: A is the father of B. Equation (18) now gives
and so the LR with 𝜃 correction is
The inbreeding coefficient approach gives
and LRf = 1/pa. Note that the LR does not depend on f and that this is true for all genotype combinations for A and B. The LRs for other genotype combinations for A and B with 𝜃 correction are given in Table 10.8 in [13].
To illustrate (19) consider the hypothesis HP2: A and B are paternal half siblings whose father is inbred. Table 2 then gives and , and by Eqs. 18 and 19 we may write down the likelihood for any genotype combinations. For instance, when A is homozygous a/a and B homozygous b/b the likelihood is
The LR comparing HP2 with A and B being unrelated becomes . If A and B share alleles, the LR will depend also on 𝜃.
R implementation
Utilities to perform the computations in this paper are provided in a R library named InbredLR, available from the first author, building on several packages in the ped suite, notably pedprobr and forrel [3]. The core of InbredLR are functions that compute the expectation and variance of the likelihood ratio for pairwise relationships. The user can specify the parameters (κ, f or Δ) or specify the pedigrees, possibly with inbred founders. A function for simulating marker data to estimate the distribution of LR is also provided, as well as a function for visualizing pedigrees HP and HD and the true pedigree and location of the corresponding IBD coefficients in the IBD triangle.
Results
Paternity case for siblings with inbred founders
Consider two individuals who claim to be related as parent and offspring. Their true relationship is siblings and their parents coefficients of inbreeding are fT = (f1,f2). Figure 4 shows the case. This example can be relevant for family reunion cases, where a parent-child relationship would give right to residence permit, whereas a sibling relationship would not. In [14] such a case is considered. HP and HD and their true relationship are in terms of the IBD coefficients given as
20 |
where κT(fT) = κT(f1,f2) are as in the first row of Table 2. Keeping in mind Remark 1, we apply (5) to find the expected LR:
21 |
Figure 5 plots as a function of the inbreeding level (assuming f1 = f2), for a single locus with L = 2, 10 and 50 alleles.
Without founder inbreeding, . Interestingly, this is the same as the expectation if HP was true, i.e., if the two individuals were in fact father and son (see first row of Table 3). The variance of differs between the two cases, however (not shown here).
As the background inbreeding of the true sibling pedigree increases, increases. The expected LR of the paternity case (and hence the trust in HP) is therefore higher if the true relatedness is siblings with background inbreeding, rather than the tested parent-child relationship. The variance of decreases moderately for increasing founder inbreeding. For increasing number of alleles L, the slope of the expected LR increases.
The following calculation gives a simple approximation of the inflation in the expected LR caused by background inbreeding. Suppose f1 = f2 = f, and write (21) as μ0 + μf, where is the expected LR without founder inbreeding, and is the expected contribution caused by founder inbreeding. Note that , and that for L ≥ 5 we have . This implies that with N independent markers, the total LR has expectation
This means that a background inbreeding level f will inflate the expected LR by at least . For example, if N = 20 and f = 0.05, the inflation rate is greater than 50%.
Siblings and half siblings with founder inbreeding
Distinguishing between siblings and half siblings can be difficult based on unlinked markers. Mayor and Balding address the problem in [15], with focus on the number of loci needed. If the shared parent of the half siblings has inbreeding coefficient fT > 0, the problem becomes even more interesting.
Consider the situation shown in Fig. 6. The hypotheses are
22 |
where fP = (f1,f2) are the parental inbreeding coefficients in the HP pedigree and κP(fP) and κT(fT) are as in the first and second rows of Table 2, respectively. This setup facilitates for modeling background inbreeding in both the true pedigree and in HP. Equation (5) gives
23 |
In Fig. 7, the expectation of is shown as a function of founder inbreeding fT of the true half sibling pedigree, for HP stating sibling pedigree with founder inbreeding fP = 0 and 0.2 (assuming f1 = f2), and L = 2, 10 and 20 alleles at a locus. For increasing values of fT, increases, for all values of fP, and the evidence in favor of a sibling relationship becomes stronger.
Consider next the situation when f1 = f2 = 0. HP then assumes a sibling relationship without inbred founders. Figure 8 shows (dashed line) and LR computations from 1000 sets of simulated data, as a function of fT. The solid line gives the mean value of the simulated LR. The expected LR increases slightly as founder inbreeding increases. For Fig. 8a this seems to fit well with the mean values of the LR s from simulated data. These simulation assumes 13 loci, each of 3 alleles with allele frequencies 0.4, 0.3 and 0.3. In Fig. 8b, on the other hand, there is a substantial difference between and the mean of the simulated LR s. These simulations use 13 CODIS markers with allele frequencies ranging from 0.0003 to 0.5378 (allele frequencies are available as a part of the R library InbredLR, see the “R implementation” section). Alleles with low frequencies will more seldom be present in the simulations. The expected LR only depends on the number of alleles at a locus, but because of the rare alleles, the simulations give in practice a lower number of alleles at these loci. The simulations in Fig. 8c use the same markers, but with uniform allele frequencies for alleles at a locus. The expectation of the LR is independent of the allele frequencies and is therefore not changed, but now the mean of the simulated LR s is closer to the expected value. Even though is independent of the allele frequencies, the variance is not, and small allele frequencies increase the variance.
Finally, we offer an approximation of the inflation in the expected LR due to background inbreeding. For simplicity, we assume f1 = f2 = 0 so that HP states a normal sibling relationship. From Eq. 23 the expected LR is if fT = 0. On the other hand, if fT > 0, the expected contribution to the LR is . For L ≥ 5 we have , and it follows that
A background inbreeding level of fT will inflate the expected LR by at least . For example, with N = 20 and fT = 0.05, the inflation rate is greater than 33%.
Paternity case with inbreeding
Consider a paternity case with hypotheses as shown in Fig. 9. The alleged father is indeed the true father and has inbreeding coefficient f. We will analyze the consequences of ignoring the inbreeding in HP. The hypotheses are parameterized in the following way:
The expression for the expected LR simplifies considerably since most elements of ΔP and ΔT(fT) are zero. Equation (15) gives
and we see that increases linearly from (L + 3)/4 to (L + 1)/2 as fT goes from 0 to 1.
Consider next the variance. For brevity, we define
24 |
Note that h(i,j,k) is invariant under permutations of i,j,k. Equation (16) gives
Slooten and Egeland [1] derived the term not involving inbreeding, i.e.,
To derive the remaining term we condition on the zygosity of the son. If he is homozygous a/a, the father must also be a/a (recall that we are conditioning on Jacquard state J3). Conversely, if the son is heterozygous a/b, the father is equally likely to be a/a or b/b. This gives
In summary,
25 |
This is a concave function with respect to fT. Figure 10 shows and one standard deviation on each side as a function of founder inbreeding fT, for different number of alleles at a locus.
Discussion
In testing theory, the formulation of hypotheses is crucial. Kinship problems, as considered in this paper, are no exception. The convention of kinship testing is to compare two specific relationships using the LR. In most applications other than kinship problems, the hypotheses together span many, if not all, alternatives. For instance, a common example is testing of HWE against all possible deviations from HWE. In forensic genetics, HP: “paternity” is typically tested only against HD: “unrelated,” not all other alternatives. For this reason, it becomes essential to study what happens when the truth is neither of these hypotheses.
A pairwise non-inbred relationship can be presented by a point in the IBD triangle (see Fig. 2), or in general by the Jacquard coefficients (see Fig. 1). We have presented two ways of expressing the hypotheses and the true relationship; (i) through the Jacquard coefficients, and (ii) background relatedness or founder inbreeding. These approaches let us investigate the LR for a continuous range of relationships and values of background relatedness. In both cases, the impact on the LR has been studied by deriving exact expressions for its mean and variance. In the latter case, the required formula follows rather directly by extending results in [1] and [2]. Explicit formulas for the expected LR has been derived for several sets of relationships. In the case of Jacquard coefficients, the explicit formulas are complicated to derive, and they depend on allele frequencies. An exact expression is given also for the variance. However, as the variance depends on allele frequencies, simple closed formulas can only be derived in special cases. For general applications we rely instead on the exact numerical implementation freely available in the R library InbredLR accompanying this paper.
Equipped with the results of this paper, we can address the following question when presented with a standard LR comparing two completely specified hypotheses HP and HD: What if the true relationship between the individuals is not as stated by HP? Or this slightly different question: What if the true relationship is restricted to some particular region of the IBD triangle. Obviously, the LR can be re-evaluated to reflect the new specifications. However, the exact expressions for expectation and variance of the LR can in some cases directly allow for statements valid for a continuous range of alternatives. For instance, regions obtained by varying founder inbreeding have been displayed in Fig. 2. Assume a LR has been reported in a paternity case and that inbreeding in the father has been ignored. It is then useful to know that accounting for inbreeding would imply increase in the expected LR. This finding could be essential as there may not be data available to estimate the inbreeding coefficient for the father. Hence, exact LR calculation is not feasible.
Because the definition of “common ancestor” sometimes differs, there is a slight difference in the definition of IBD in the literature. The paper [16] gives three definitions of IBD: ancient IBD, recent IBD, and familial IBD. Our definition of IBD goes in the category of familial IBD, where “common ancestor” is restricted to a given pedigree.
The conventional approach to background relatedness in forensics is the so called theta (𝜃) correction [12]. Typical values are 𝜃 ∈ (0.01,0.03). The 𝜃 parameter applies on a population level. The genotype probabilities of all founders in the pedigree are modified compared with what HWE would give. Our approach does not model relatedness between founders, but offers a richer model of inbreeding, since individual inbreeding coefficients can be specified for each founder.
Several authors (see, e.g., [2] and the references therein) have discussed reporting the logarithm of the LR rather than the LR. Nice expressions like the ones presented for the expectation and the variance are then no longer available. In most cases, the LR is reported on the original scale. In some circumstances, as for paternity cases, the LR may be 0, and then, the logarithm is not defined. Many papers including [17] study the distribution of by simulation. Equipped with the exact expressions of this paper, could be analyzed without resorting to simulation, since the mean and variance of can be derived from the counterparts for the LR. However, if some allele frequencies are close to 0, is not well approximated by a normal distribution for a realistic number of markers. The reason for this is the large variance when allele frequencies are small. For instance, (25) shows an example where the expression for the variance include terms of the form 1/pa and these become large whenever the allele frequency pa is small. A similar problem related to small allele frequencies is discussed in the result section. This demonstrates that the center of the distribution, calculated from the expectation of , can be inaccurate. However, this criticism applies to the use of instead of in general, and not specifically to the expectations. We maintain that results like the ones presented for the expectation and variance have considerable theoretical interest, but should be used with caution in practice.
This paper has mainly addressed the likelihood ratio and its properties. The exclusion probability (EP), the probability that genotypes will be incompatible with a claimed relationship, is also an important statistic. The impact of founder inbreeding on EP is discussed in [3].
Figure 4 illustrates a case where the true inbred relationship is not known, and Fig. 5 shows the corresponding expected LR for a single marker. Increasing the number of markers will, in this paternity case, increase the inflation of the expected LR. This means that adding more markers to the LR computation will not solve the problem. In general, with a sufficient number of markers, the Jacquard, IBD, or inbreeding coefficients can be estimated accurately, and the true relationship detected. If such additional marker data is not available, the impact of inbreeding can be studied as exemplified by a paternity case with unknown inbreeding earlier in the discussion and as illustrated in, e.g., Fig. 5. As addressed in the “Introduction” section, different scenarios can be investigated and LR results can be evaluated in light of the analyses of these scenarios.
The present paper does not consider linked markers. For independent loci, the inbreeding coefficients contain sufficient information to compute the Jacquard coefficients needed in our formulas for LR. While a similar approach is conceivable also for linked markers, this would involve multi-locus coefficients, which is outside the scope of this work.
Appendix: Expectation and variance of LR
Below we derive the expressions for the expectation and variance of in the general pairwise case. Let Ji denote Jacquard state i and and the probabilities of Ji according to the relationship stated by HP and the true relationship respectively. is then defined as the likelihood ratio comparing HP:ΔP with HD:ΔD when the marker data comes from the relationship ΔT. Similarly, denotes the likelihood ratio comparing Jacquard state Ji with unrelated, i.e., J9 when the marker data are generated by Jj.
Equation (15) follows by combining (1), (14), and (4)
26 |
In the case of no inbreeding, i.e., Δ1 = ⋯ = Δ6 = 0, the above expression reduces to (5). The part of the 9 × 9 matrix B9 corresponding to (J7,J8,J9) coincides with the matrix given in Eq. 6. Since , B9 is symmetric. The elements of B9 are found by direct calculation. For instance, entry (1,1) equals
Since the expectation has been calculated, to derive the variance it remains only to find
The matrices B1,…,B9 are symmetric 9 × 9 matrices. The simplest of these matrices is B9, given in Table 4. In general, Bi consists of the elements . The values for i,j,k = 7, 8, 9 have been provided in the “Review of previous results” section. Entry (j,k) of Bi is
27 |
All matrices can in principle be found from the above expression, but exact calculations by hand become unpractical and exact numerical calculation is more reasonable.
Funding
Open Access funding provided by Norwegian University of Life Sciences.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
None required as no data from humans are used.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Hilde Kjelgaard Brustad, Email: hilde.brustad@nmbu.no.
Magnus Dehli Vigeland, Email: m.d.vigeland@medisin.uio.no.
Thore Egeland, Email: thore.egeland@nmbu.no.
References
- 1.Slooten KJ, Egeland T. Exclusion probabilities and likelihood ratios with applications to kinship problems. Int J Legal Med. 2014;128(3):415–425. doi: 10.1007/s00414-013-0938-0. [DOI] [PubMed] [Google Scholar]
- 2.Egeland T, Slooten KJ. The likelihood ratio as a random variable for linked markers in kinship analysis. Int J Legal Med. 2016;130(6):1445–1456. doi: 10.1007/s00414-016-1416-2. [DOI] [PubMed] [Google Scholar]
- 3.Vigeland MD, Egeland T (2019) Handling founder inbreeding in forensic kinship analysis. Forensic Science International: Genetics Supplement Series. 10.1016/j.fsigss.2019.10.175
- 4.Vigeland MD. Relatedness coefficients in pedigrees with inbred founders. J Math Biol. 2020;81:185–207. doi: 10.1007/s00285-020-01505-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Thompson EA (2000) Statistical inference from genetic data on pedigrees. IMS
- 6.Wright S. Coefficients of inbreeding and relationship. The American Naturalist. 1922;56:330–338. doi: 10.1086/279872. [DOI] [Google Scholar]
- 7.Buckleton J, Curran J, Goudet J, Taylor D, Thiery A, Weir BS. Population-specific FST values for forensic STR markers: a worldwide survey. Forensic Science International: Genetics. 2016;23:91–100. doi: 10.1016/j.fsigen.2016.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pemberton TJ, Rosenberg NA. Population-genetic influences on genomic estimates of the inbreeding coefficient: a global perspective. Human Heredity. 2014;77(1-4):37–48. doi: 10.1159/000362878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jacquard A. Genetic information given by a relative. Biometrics. 1972;28(4):1101–1114. doi: 10.2307/2528643. [DOI] [PubMed] [Google Scholar]
- 10.Cotterman CW (1940) A calculus for statistico-genetics. Dissertation, The Ohio State University
- 11.Thompson EA. A restriction on the space of genetic relationships. Ann Hum Genet. 1976;40(2):201–204. doi: 10.1111/j.1469-1809.1976.tb00181.x. [DOI] [PubMed] [Google Scholar]
- 12.Balding DJ, Nichols RA. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995;96(1–2):3–12. doi: 10.1007/BF01441146. [DOI] [PubMed] [Google Scholar]
- 13.Buckleton J, Triggs CM, Walsh SJ. Forensic DNA evidence interpretation. Florida: CRC Press; 2005. [Google Scholar]
- 14.Gorlin JB, Polesky HF. The use and abuse of the full-sibling and half-sibling indices. Transfusion. 2000;40(9):1148–1149. doi: 10.1046/j.1537-2995.2000.40091148.x. [DOI] [PubMed] [Google Scholar]
- 15.Mayor LR, Balding DJ. Discrimination of half-siblings when maternal genotypes are known. Forensic Sci Int. 2006;159(2–3):141–147. doi: 10.1016/j.forsciint.2005.07.007. [DOI] [PubMed] [Google Scholar]
- 16.Browning BL, Browning SR. A fast, powerful method for detecting identity by descent. The American Journal of Human Genetics. 2011;88(2):173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nothnagel M, Schmidtke J, Krawczak M. Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci. Int J Legal Med. 2010;124(3):205–215. doi: 10.1007/s00414-009-0413-0. [DOI] [PubMed] [Google Scholar]