PADRE: Pedigree-Aware Distant-Relationship Estimation

Jeffrey Staples; David J Witherspoon; Lynn B Jorde; Deborah A Nickerson; the University of Washington Center for Mendelian Genomics; Jennifer E Below; Chad D Huff

doi:10.1016/j.ajhg.2016.05.020

. 2016 Jun 30;99(1):154–162. doi: 10.1016/j.ajhg.2016.05.020

PADRE: Pedigree-Aware Distant-Relationship Estimation

Jeffrey Staples ¹, David J Witherspoon ², Lynn B Jorde ², Deborah A Nickerson ¹; the University of Washington Center for Mendelian Genomics¹, Jennifer E Below ^3,^5,^∗, Chad D Huff ^4,^5,^∗∗

PMCID: PMC5005450 PMID: 27374771

Abstract

Accurate estimation of shared ancestry is an important component of many genetic studies; current prediction tools accurately estimate pairwise genetic relationships up to the ninth degree. Pedigree-aware distant-relationship estimation (PADRE) combines relationship likelihoods generated by estimation of recent shared ancestry (ERSA) with likelihoods from family networks reconstructed by pedigree reconstruction and identification of a maximum unrelated set (PRIMUS), improving the power to detect distant relationships between pedigrees. Using PADRE, we estimated relationships from simulated pedigrees and three extended pedigrees, correctly predicting 20% more fourth- through ninth-degree simulated relationships than when using ERSA alone. By leveraging pedigree information, PADRE can even identify genealogical relationships between individuals who are genetically unrelated. For example, although 95% of 13^th-degree relatives are genetically unrelated, in simulations, PADRE correctly predicted 50% of 13^th-degree relationships to within one degree of relatedness. The improvement in prediction accuracy was consistent between simulated and actual pedigrees. We also applied PADRE to the HapMap3 CEU samples and report new cryptic relationships and validation of previously described relationships between families. PADRE greatly expands the range of relationships that can be estimated by using genetic data in pedigrees.

Keywords: relatedness estimation, pedigree reconstruction, genetic analysis

Introduction

Accurate prediction and verification of relationships among individuals is essential in a variety of genetic studies. Errors in pedigrees are common¹^,²^,³ and have adverse consequences, including biased phasing and family-based imputation results, inaccurate identification of Mendelian errors, and reduction of power to detect linkage⁴ or family-based associations. Therefore, ensuring that the genetic relationships among the DNA samples match the reported pedigree structure is critical for accurate family-based genetic analysis.⁵ Detecting cryptic relationships can be important as well.⁶ Genetic relationships identified in population studies can be leveraged for improved haplotype phase inference, detection of population structure, genotype imputation, and study designs such as identical-by-descent (IBD) mapping and tests to detect multiple rare and common variants that contribute to disease.⁷^,⁸^,⁹^,¹⁰^,¹¹^,¹²^,¹³ The identification of relatives also plays an important role in forensics in criminal investigations,¹⁴ identification of victims of mass disaster,¹⁵ and discovery of family history.

With close relationships (first through third degree), pedigree reconstruction can provide the kinship structure of the individuals in a genetic dataset.⁵ However, genetic datasets often contain relationships that are more distant than third degree, resulting in sparsely connected pedigrees that are unsuitable for reconstruction. Algorithms that consider IBD segment data, such as ERSA (estimation of recent shared ancestry),¹⁶^,¹⁷ can accurately predict pairwise relationships up to ninth-degree relatives (e.g., fourth cousins), but do not reconstruct pedigrees, nor can they utilize information from known or observed pedigree structures in the data. Here, we introduce PADRE (pedigree-aware distant-relationship estimation), which leverages the pedigree reconstruction of known or cryptic first- to third-degree relatives by PRIMUS (pedigree reconstruction and identification of a maximum unrelated set)⁵ along with the accurate distant relationship predictions by ERSA.¹⁷ PADRE, which has been implemented as an extension of PRIMUS and ERSA, uses ERSA-generated relationship likelihoods to identify the highest composite likelihood connection between family networks reconstructed by PRIMUS (Figure S1), significantly improving the accuracy of the predictions and expanding the range of relationships that can be predicted.

Subjects and Methods

PADRE Algorithm

PADRE combines reconstructed pedigree information with distant pairwise relationship predictions to identify distant relationships between pedigrees and requires results from PRIMUS (v.1.8.0) and ERSA (v.2.1) as input (Figure S1). PRIMUS identifies family-based networks of individuals within a dataset, where each family network consists of the set of individuals with a detected first- through third-degree relationship to at least one other individual in the network. When PRIMUS reconstructs a dataset into family networks (i.e., Net₁, Net₂, … Net_n), each network can be represented by one or more possible pedigrees that fit the genetic data, annotated here with subscripts (i.e., Net₁₁, Net₁₂, … and Net_1j). PADRE tests for significant relationships between each pair of networks by using the likelihood ratio test in ERSA. This test compares the null model that two founders in two different networks, x and y, are unrelated to the alternative model that the two individuals are N^th-degree relatives. If a significant relationship is detected between any two founders, PADRE will then identify the best fitting relationship between the two networks by using a composite likelihood framework. The significance threshold is 0.05 by default but can be adjusted for the number of founder-to-founder relationships tested by application of a Bonferroni correction with the command-line argument adding “--PADRE_multiple_test_correct.” For a significant relationship detected between founders x and y, PADRE calculates the maximum composite likelihood for each possible N^th-degree relationship between x and y by multiplying the cross-network pairwise relationship likelihoods in ERSA for each pair of pedigrees Net_1i and Net_2j:

{\hat{L}}_{1 i 2 j} (x, y | N) = L_{N e t_{1 i}} L_{N e t_{2 j}} \prod_{\begin{matrix} \forall a \in N e t 1_{i} \\ \forall b \in N e t 2_{j} \end{matrix}} {\hat{L}}_{a b} (s_{a b} | D_{a b}),

(Equation 1)

where $a$ and $b$ are individuals in the pedigrees Net_1i and Net_2j, respectively, $D_{a b}$ is the degree of relatedness between $a$ and $b$ given the two pedigrees and that founders x and y are N^th-degree relatives, and s_ab is a set containing the length of each detected IBD segment between a and b. ${\hat{L}}_{a b} (s_{a b} | D_{a b})$ is the maximum likelihood of the observed IBD segments shared by a and b conditioned on the degree of relationship distance D_ab specified by N, Net_1i and Net_2j. L_Net1i and L_Net2j are the composite pedigree likelihoods consisting of the product of the PRIMUS likelihoods for each pairwise relationship specified by Net_1i and Net_2j, respectively. When D_ab is less than 10, ${\hat{L}}_{a b}$ includes two additional estimated parameters compared to the model with no relationship (relationship distance and number of shared segments conditioned on relationship distance); these likelihoods are calculated by ERSA.¹⁶

Because many 10^th- and most 11^th-degree human relatives share no autosomal IBD segments from their most recent common ancestor,¹⁶ models involving relationships more distant than ninth degree require special consideration. Although such models also include two additional parameters, the maximum likelihood estimate for the number of shared genetic segments is typically 0, resulting in a compressed free parameter space. Maximizing the likelihood of such models without accounting for the reduced free parameter space over-penalizes such distant relationships. We address this problem with the following approximation. Given that two individuals, a and b, are genetic ninth-degree relatives, the unconditional maximum likelihood of a 10^th-degree relationship for individual a and the offspring of individual b is as follows: with 50% probability, the shared segment is inherited by the offspring of individual b, and the likelihood is equal to the ninth-degree relationship likelihood. Otherwise, the likelihood is equal to the unrelated likelihood. This approximation holds for relationship distances detectable by PADRE and beyond (see Figure S2) and leads to the following formula to approximate the likelihood of 10^th-degree and more-distant relationships in PADRE:

{\hat{L}}_{a b} (s_{a b} | D_{a b} > 9) = {(0.5)}^{D_{a b} - 9} {\hat{L}}_{a b} (s_{a b} | D_{a b}^{'} = 9) + (1 - {(0.5)}^{D_{a b} - 9}) {\hat{L}}_{a b} (s_{a b} |unrelated),

(Equation 2)

where the effective degrees of freedom is given by:

g (D_{a b}) = [\begin{matrix} 2, & D_{a b} \leq 9 \\ {(0.5)}^{D_{a b} - 9}, & D_{a b} > 9 \end{matrix}] .

(Equation 3)

To identify the best fitting model, PADRE calculates the composite likelihood Akaike information criterion (CL-AIC) for each possible fourth- through ninth-degree relationship between the founders of the two family networks via Equation 1.¹⁸ Because each network could have more than one possible pedigree, we evaluate all pairs of possible pedigrees identified by PRIMUS for each network and identify the pair of pedigrees that minimizes the CL-AIC of the two networks. For a given pair of pedigrees Net_1i and Net_2j, the CL-AIC is calculated according to Equation 4:

A I C_{1 i 2 j} (x, y | N) = 2 k_{1 i 2 j} (x, y, N) - 2 l n {\hat{L}}_{1 i 2 j} (x, y | N) - l n L_{N e t_{1 i}} - l n L_{N e t_{2 j}},

(Equation 4)

where k is equal to the effective number of parameters in ${\hat{L}}_{1 i 2 j} (x, y | N)$ . The value for k is given by Equation 5:

k_{1 i 2 j} (x, y, N) = \sum_{\begin{matrix} \forall a \in N e t 1_{i} \\ \forall b \in N e t 2_{j} \end{matrix}} g (D_{a b}) .

(Equation 5)

Finally, PADRE evaluates all pairs of possible pedigrees identified by PRIMUS for each network to identify the model that minimizes the CL-AIC of the two networks:

A I C_{min} (N e t 1, N e t 2) = \min_{\begin{matrix} x \in N e t 1 \\ y \in N e t 2 \\ 4 \leq N \leq 9 \\ 1 \leq i \leq N e t 1_{n} \\ 1 \leq j \leq N e t 2_{n} \end{matrix}} A I C_{1 i 2 j} (x, y | N) .

(Equation 6)

For each pair of family networks, PADRE reports the pair of founders, their degree of relatedness, and the two pedigrees from the model specified by the $A I C_{min}$ . In a separate output file, PADRE provides the degree of relatedness between each pair of samples in the model specified by the $A I C_{min}$ .

PADRE takes, as a command line option, the maximum degree of relatedness PRIMUS uses to reconstruct and then adjusts the range of ERSA predictions to test all relationships greater than the maximum degree of relatedness in PRIMUS. By default, the maximum degree of relatedness is three, and PADRE thus considers all fourth- through ninth-degree relationships in ERSA.

Simulations

We simulated pedigrees to evaluate the accuracy and relative benefit of using PADRE to detect distant relationships. We used two identical 13-person, three-generation pedigree structures and connected a founder in each pedigree by varying the number of generations to their recent common ancestor. Figure 1 illustrates a simulated pedigree in which founders A2 and B2 are ninth-degree relatives. To test the full range of predictions beyond the third degree, we generated versions of the pedigree in which individuals A2 and B2 are fourth- through ninth-degree relatives. For each of these versions of simulated pedigree structures, we created 100 different sets of genotypes by using the method described in Morrison.¹⁹ We randomly selected haplotypes with ∼1 M SNPs from among the unrelated HapMap3²⁰ CEU (Utah residents with ancestry from northern and western Europe from the CEPH collection) samples and assigned them to the all founders (individuals with red symbols in Figure 1). The unrelated set of CEU samples was determined by running ERSA (v.2.1) on all the HapMap3 CEU samples and then running the IMUS algorithm within PRIMUS²¹ to identify the maximum unrelated set of individuals. We then used Morrison’s recombination simulation software to propagate the founder genotypes through the pedigree. This method simulates recombination events as a homogeneous Poisson process by using the genetic map provided with the HapMap3 data, disregarding the centromere. Genotypes were removed for all individuals not included in either of the 13-person pedigrees. IBD estimates were calculated with PLINK v.1.9²²

plink - - file [data_file_root_name] - - genome - - maf 0.05 - - geno 0.1 - - out [data_file_root_name],

and all simulated pedigrees were reconstructed with PRIMUS (v.1.8):

run_PRIMUS . pl - - p [data_file_root_name] . genome .

Pedigree Structure Used to Simulate Ninth-Degree Pedigrees

100 ninth-degree pedigrees, each with different genotypes, were generated with A2 and B2 related as ninth-degree relatives. The same pedigree structures for samples A1–A13 and B1–B13 were also used to generate 100 pedigrees, each with different genotypes, where A2 and B2 were fourth-, fifth-, sixth-, seventh-, eighth-, and ninth-degree relatives. The number of ancestral relatives was adjusted to account for the different degree of relatedness.

We obtained ERSA (v.2.1) results for each simulation as described below.

To test improvements in relationship predictions by PADRE as the size and density of genotyped individuals increased, we first used PRIMUS, ERSA, and PADRE to analyze individuals A6 and B6 in each simulated pedigree (Figure 1). We repeated the analyses iteratively, including genotypes of an additional randomly selected first- or second-degree relative of A6 and B6 in each iteration. We then performed a final analysis using all 13 individuals in each pedigree (see Figure 1).

Runtime

We evaluated PADRE runtime on a machine with Intel Xeon CPU E5-2670 v.2 at 2.50 GHz with 14 GB of memory, subtracting the time needed to load the ERSA likelihood files. The number of comparisons is the number of pairwise likelihoods that were looked up during the PADRE analysis and is the single best estimate of runtime. Each comparison is conducted at the lowest level of five nested for loops: (1) for each pair of networks, (2) for each pair of possible pedigrees within the networks, (3) for each pair of founders between each of the pedigrees in different networks, (4) for degrees of relatedness between the fourth and ninth degrees, and (5) for each pair of non-missing individuals between the two pedigrees.

The variability in the comparisons per second is due to variability in the other PADRE calculations. PRIMUS reconstruction was unable to complete for all family networks when it was run on the European ancestry dataset using third-degree relationships as a cutoff because some family networks resulted in too many possible pedigree structures consistent with the genetic dataset. The results of these runtime comparisons are shown in Table S1.

Extended Pedigree Samples

We analyzed Affymetrix 6.0 SNP microarray data on 169 individuals from three previously described extended pedigrees with predominantly northern European ancestry.¹⁶ The three pedigrees were validated as described in Huff et al.,¹⁶ are composed of 24, 30, and 115 genotyped individuals, and included a total of 7,266 previously described relationships between pairs of individuals.

HapMap3 CEU Samples

Using 165 CEU individuals from HapMap3 release 2²⁰ obtained from the HapMap website (see Web Resources), we reconstructed pedigree structures in this dataset with PRIMUS as described below by using the default settings. We applied a Bonferroni correction when detecting initial relationships between family networks identified in PRIMUS of p = 5.5 × 10⁻⁶ (0.05/9,074 founder-to-founder relationships).

Pedigree Reconstruction with PRIMUS

PRIMUS uses genome-wide IBD estimates to identify families and reconstruct all possible pedigrees that fit the genetic data by using relationships as distant as third-degree relatives. We used the prePRIMUS IBD pipeline⁵ to generate genome-wide average IBD estimates between all samples in each pedigree and used PRIMUS (v.1.8) to reconstruct pedigrees. The command line options used were “--file [data_file_root_name] and --genome.” Due to the sparse number of individuals genotyped in the three European ancestry pedigrees and in many of the simulations which lead to long runtimes in PRIMUS, we applied a relatedness threshold of second degree in PRIMUS to both datasets by adding the command line option “--degree_rel_cutoff 2.” We used the default relatedness cutoff of third-degree relatives for the HapMap3 CEU dataset.²⁰

Distant Relationships Prediction with ERSA

We applied the IBD detection pipeline described by Glusman et al.¹⁷ by first phasing all genetic data with Beagle (v.3.3.2)¹¹ by using the phasing pipeline provided on the GERMLINE website (see Web Resources). We analyzed the phased data in GERMLINE (v.1.4.0)²³ for each chromosome with the following command:

\begin{array}{l} germline - homoz - err_het 1 - err_hom 2 - map \\ [data_root_name_chrN] . map - min_m 2.5 < \\ [data_root_name_chrN_options] . txt \end{array}

We analyzed the GERMLINE output files with ERSA (v.2.1) to calculate the likelihood of each possible pairwise relationship (from the first through 39^th degrees) among all samples in the dataset. We controlled for potential false-positive IBD segments by masking genomic regions from the 1000 Genomes Project²⁴ CEU samples with greater than a 4-fold excess of pairwise IBD (mask_region_threshold = 4) as previously described:¹⁷

\begin{array}{l} ersa - - segment_files = [sample_data_germline . match_files] - - \\ model_output_file model_likelihoods . txt - - output_file = ersa_results - - \\ confidence_level 0.999 - - mask_common_shared_regions true - - \\ control_files = [CEU_germline . match_files] \end{array}

Results

To evaluate the improvements in relationship prediction, we ran PADRE on 600 simulated pedigrees, each with ten different patterns of genotyped individuals, and compared the accuracy of the resulting pairwise relationship predictions (see Subjects and Methods). Figure 2A shows that PADRE and ERSA alone exhibited the same accuracy when the individuals had no other first- or second-degree relatives in the pedigree. However, as additional genotyped individuals were included in the pedigrees, PADRE accurately predicted up to 56% more of the simulated relationships. In addition to higher relationship prediction accuracy, Figure 2A demonstrates that PADRE predicted relationships that are undetectable by methods that consider only pairwise genetic data. For example, PADRE detected over 50% of 13^th-degree relationships, although 95% of 13^th-degree relatives share no genetic material through their most recent common ancestors (in humans). PADRE provided a substantial increase in power by correctly detecting up to 83% of seventh- through 13^th-degree relationships in our simulations (Figure 2B).

Comparison of PADRE versus ERSA Accuracy in Simulated Data by Degree of Relationships

(A) The observed accuracy in the pedigree predictions increases as additional first- and second-degree relatives are added.

(B) Power of PADRE and ERSA alone to detect simulated relationships as additional first- and second-degree relatives were added to the pedigree. The ERSA results fluctuate slightly because additional pairwise estimates are added as more individuals are included, as described with the generation of the simulations.

Figure 2 displays the ERSA and PADRE results for simulated pedigrees as large as 20 individuals, and Figure 3 summarizes the results for the simulated pedigrees with all 26 individuals. PADRE predicted the exact degree of relationship for 20% additional fourth- through ninth-degree relationships, relative to ERSA alone. For 10^th- through 13^th-degree relationships, ERSA accurately predicted only 4% of relationships to within one degree. In comparison, PADRE accurately predicted 59% of the simulated 10^th- through 13^th-degree relationships to within one degree, even though approximately 71% of such relatives share no DNA segments that are IBD (additional comparative data are shown in Figure S3). This can be accomplished because genetic relationships across pedigree founders propagate through pedigrees, allowing for multiple pairwise comparisons, which improves accuracy and results in accurate estimates of distant genealogical relationships even in pairs of descendants who inherited no genomic segments in common. Thus, by utilizing the pairwise sharing across all members of both pedigrees, PADRE is frequently able to predict very distant genealogical relationships between deeply genotyped pedigrees that are undetectable from single pairwise genetic comparisons.

PADRE and ERSA Prediction Accuracy on Simulated Pedigrees Where All Individuals Have Been Genotyped

PADRE more accurately predicts fifth- through tenth-degree relationships relative to ERSA and frequently identifies 11^th- through 13^th-degree relatives who were undetectable in ERSA.

We also analyzed Affymetrix 6.0 microarray data from 169 individuals in three previously described extended pedigrees with predominantly northern European ancestry.¹⁶ The three pedigrees were composed of 24, 30, and 115 genotyped individuals and included a total of 7,266 pairs of related individuals. As expected, ERSA and PADRE attained the same accuracy for pairs of individuals with no genotyped first- or second-degree relatives. However, when we considered pairs of individuals who had two first- or second-degree relatives, we observed a substantial improvement in accuracy with PADRE (Figure 4), whereas ERSA’s accuracy rate was unchanged. PADRE correctly predicted 39% (95% confidence interval: 38% to 40%) of the 10^th-degree relationships within one degree of relatedness when the individuals had two first- or second-degree relatives in the pedigree, in comparison to 23% (95% confidence interval: 22% to 24%) for ERSA alone. In addition, PADRE was able to detect 9% of the 11^th-degree relationships, whereas ERSA did not detect any. The relationship prediction accuracy in this dataset increased as the number of first- and second-degree relatives in the pedigree increased, broadly matching the improvement we observed in our simulations (Figure 4). Effects of Bonferroni correction on relationship estimation accuracy in these data are shown in Figure S4.

Percentage of Relationships Correctly Predicted by PADRE to Within ± One Degree in Real Pedigrees of European Ancestry and Simulated Pedigrees

Relationship detection accuracy was broadly consistent between the real and simulated pedigrees. Because the real pedigrees included two or fewer first- or second-degree relatives, PADRE’s estimated relationship detection accuracy for pedigrees with three or more sampled relatives is based solely on simulated data.

We previously reconstructed 51 separate pedigrees within the HapMap3 CEU dataset.²⁰ These pedigrees contain between two and six individuals. PADRE identified relationships between 40 pairs of pedigrees consisting of 594 pairs of individuals via previously unknown fourth- through ninth-degree relationships (Figure 5). Figure 6 illustrates one example in which PADRE predicts relationships connecting founders from four previously described CEU pedigrees.

A Graph of PADRE-Estimated Relationships among the CEU Samples with a Bonferroni-Adjusted Threshold of α = 0.05/9,090 = 5.5 × 10⁻⁶

Each node corresponds to a PRIMUS reconstructed network number, and an edge between nodes indicates a significant relationship predicted by PADRE using pairwise relationship likelihoods obtained by ERSA. The number next to each edge indicates the degree of relationship connecting a founder in the reconstructed pedigree of each network. This type of network graph is the standard output of PADRE.

An Example of Four Distantly Related HapMap3 CEU Pedigrees with Relationships Predicted by PADRE

Although the trios and the full-sibling relationship between NA12813 and NA07045 have been previously reported, PADRE is able to identify statistically significant relationships connecting these distantly related pedigrees. The related pairs of founders are marked with the dotted lines, and the degree of relationship is labeled next to the line.

We have demonstrated through simulated and actual data that PADRE can leverage pedigree reconstruction results from PRIMUS and distant pairwise relationship predictions from ERSA to improve both the sensitivity and accuracy of distant relationship estimation. The power to detect relationships more distant than ninth-degree relatives was dependent on the number of generations in the pedigrees with genotype data. For instance, PADRE detected up to 13^th-degree relationships in the simulated pedigrees with three generations of genotype data and the founders of the pedigrees (A2 and B2, Figure 1). As the depth of the pedigrees connected by PADRE increases, so will the distance of relationships that PADRE will be able to predict. Relationship estimation accuracy in PADRE improved as the number of genotyped individuals within each pedigree increased (Figures 2 and 4) and was most accurate when all individuals within a pedigree were genotyped (Figure 3).

We note that PADRE assumes absence of consanguinity and thus does not look for distant relationships within the reconstructed pedigree structures identified by PRIMUS. However, these types of relationships can be detected in other ways, for example, by using ISCA²⁵ and ERSA (v.2) to evaluate regions of the genome that are shared IBD on both chromosomes (IBD2) between founders within a pedigree. Although PADRE can connect a single pedigree, and even a single founder, to multiple other pedigrees, the algorithm is currently limited to establishing a maximum of one distant relationship between founders of any given pair of pedigrees. Allowing for multiple relationships between founders of a pair of pedigrees will require modeling of independently inherited shared segments to prevent confounding and is a direction of work for future releases of PADRE.

Discussion

PADRE has several important and immediate applications in human genetic analysis, especially in large case-control studies. PADRE can detect cryptic fourth- through 13^th-degree relationships, even in small datasets, as shown in our analysis of the CEU data (Figures 5 and 6). By identifying and appropriately modeling these relationships, studies can avoid findings biased by relatedness²⁶ and in some cases might be able to leverage familial relationships to improve power.²⁷ This is particularly important for detecting relatively high-penetrance risk alleles segregating in distantly related pedigrees.

Existing prediction algorithms for detecting distant pairwise relationships use the number and size of shared IBD segments between two individuals to estimate their degree of relatedness.¹⁶^,²⁸ However, as the degree of relatedness increases, the number of shared segments drops to zero. Most 11^th-degree human relatives share no segments of their autosomal DNA IBD;¹⁶ therefore, their degree of relatedness cannot be estimated by existing pairwise comparison programs. In some scenarios, PADRE can leverage reconstructed pedigrees to identify genealogical relationships between individuals who are genetically unrelated, i.e., share no portion of their genome IBD through their most recent common ancestors.

PADRE runtime increases combinatorially depending on the number of family networks, the number of possible pedigrees within each family network, the number of founders in each of the pedigrees, and the number of non-missing individuals in each pedigree structure in the PRIMUS results. These numbers are difficult to predict prior to running PRIMUS and depend heavily on how densely the pedigrees have been sampled (Figure S5). For some datasets, it will be necessary to use a closer relatedness cutoff for the PRIMUS reconstruction in order to limit the number of possible pedigrees generated. This adjustment will in turn improve the runtime of PADRE. We have employed this technique with the European ancestry pedigrees due to the sparse sampling of individuals. Table S1 and accompanying text provides additional information on runtimes and computational limitations of PADRE.

There has been a resurgence of interest in large and deeply genotyped pedigrees in the search for genetic heritability of complex disease traits. Pedigrees have become especially relevant in the detection of rare variant effects on diseases because pedigrees are well-suited for the study of rare variation.⁹ Under the hypothesis that multiple rare and common variants contribute to complex disease, projects such as the Alzheimer’s Disease Sequencing Project, the San Antonio Mexican American Family Studies, and the Jackson Heart Study have all undertaken deep whole-genome sequencing of members of clinically ascertained pedigrees. Projects such as these could particularly benefit directly from verification and detection of distant relatedness in PADRE.

PADRE leverages genome-wide average IBD sharing, as well as the size and distribution of shared IBD segments, to achieve a substantial improvement in accuracy over existing methods. PADRE has immediate relevance to a host of applications within genetics, allowing investigators to more accurately estimate cryptic relatedness, verify very distant relationships, and maximize power in analytic design. PADRE is freely available for academic use (see Web Resources).

Data Access

Access to PADRE input data for the extended pedigrees has been made publicly available. The ERSA-derived shared segments (as described in Huff et al. 2011¹⁶) as well as the PRIMUS-derived pedigree likelihoods for the extended pedigree samples are available on the PADRE website (see Web Resources).

Acknowledgments

We thank Lauren Petty for her helpful comments on the manuscript. J.S. was supported by the National Science Foundation Graduate Research Fellowship under grant DGE-0718124. D.A.N. was supported by the University of Washington Center for Mendelian Genomics (UW-CMG), funded by the National Human Genome Research Institute and the National Heart, Lung and Blood Institute, grant U54HG006493. C.D.H. was supported by R01 GM104390.

Published: June 30, 2016

Footnotes

Supplemental Data include five figures and one table and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2016.05.020.

Contributor Information

Jennifer E. Below, Email: jennifer.e.below@uth.tmc.edu.

Chad D. Huff, Email: chad@hufflab.org.

Web Resources

GERMLINE, http://www.cs.columbia.edu/∼gusev/germline/phasing_pipeline.tar.gz
HapMap, http://hapmap.ncbi.nlm.nih.gov
PADRE, http://www.hufflab.org/software/padre

Supplemental Data

Document S1. Figures S1–S5 and Table S1

mmc1.pdf^{(858.5KB, pdf)}

Document S2. Article plus Supplemental Data

mmc2.pdf^{(1.9MB, pdf)}

References

1.Bellis M.A., Hughes K., Hughes S., Ashton J.R. Measuring paternal discrepancy and its public health consequences. J. Epidemiol. Community Health. 2005;59:749–754. doi: 10.1136/jech.2005.036517. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kerr S.M., Campbell A., Murphy L., Hayward C., Jackson C., Wain L.V., Tobin M.D., Dominiczak A., Morris A., Smith B.H., Porteous D.J. Pedigree and genotyping quality analyses of over 10,000 DNA samples from the Generation Scotland: Scottish Family Health Study. BMC Med. Genet. 2013;14:38. doi: 10.1186/1471-2350-14-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wolf M., Musch J., Enczmann J., Fischer J. Estimating the prevalence of nonpaternity in Germany. Hum. Nat. 2012;23:208–217. doi: 10.1007/s12110-012-9143-y. [DOI] [PubMed] [Google Scholar]
4.Boehnke M., Cox N.J. Accurate inference of relationships in sib-pair linkage studies. Am. J. Hum. Genet. 1997;61:423–429. doi: 10.1086/514862. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Staples J., Qiao D., Cho M.H., Silverman E.K., Nickerson D.A., Below J.E., University of Washington Center for Mendelian Genomics PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. Am. J. Hum. Genet. 2014;95:553–564. doi: 10.1016/j.ajhg.2014.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Palamara P.F., Francioli L.C., Wilton P.R., Genovese G., Gusev A., Finucane H.K., Sankararaman S., Sunyaev S.R., de Bakker P.I., Wakeley J., Genome of the Netherlands Consortium Leveraging Distant Relatedness to Quantify Human Mutation and Gene-Conversion Rates. Am. J. Hum. Genet. 2015;97:775–789. doi: 10.1016/j.ajhg.2015.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Saad M., Wijsman E.M. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet. Epidemiol. 2014;38:579–590. doi: 10.1002/gepi.21844. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Saad M., Wijsman E.M. Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. Genet. Epidemiol. 2014;38:1–9. doi: 10.1002/gepi.21776. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wijsman E.M. The role of large pedigrees in an era of high-throughput sequencing. Hum. Genet. 2012;131:1555–1563. doi: 10.1007/s00439-012-1190-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Browning S.R., Browning B.L. Identity by descent between distant relatives: detection and applications. Annu. Rev. Genet. 2012;46:617–633. doi: 10.1146/annurev-genet-110711-155534. [DOI] [PubMed] [Google Scholar]
11.Browning S.R., Browning B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Browning S.R., Thompson E.A. Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics. 2012;190:1521–1531. doi: 10.1534/genetics.111.136937. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.O’Connell J., Gurdasani D., Delaneau O., Pirastu N., Ulivi S., Cocca M., Traglia M., Huang J., Huffman J.E., Rudan I. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10:e1004234. doi: 10.1371/journal.pgen.1004234. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Alvarez-Cubero M.J., Saiz M., Martinez-Gonzalez L.J., Alvarez J.C., Eisenberg A.J., Budowle B., Lorente J.A. Genetic identification of missing persons: DNA analysis of human remains and compromised samples. Pathobiology. 2012;79:228–238. doi: 10.1159/000334982. [DOI] [PubMed] [Google Scholar]
15.Lin T.H., Myers E.W., Xing E.P. Interpreting anonymous DNA samples from mass disasters--probabilistic forensic inference using genetic markers. Bioinformatics. 2006;22:e298–e306. doi: 10.1093/bioinformatics/btl200. [DOI] [PubMed] [Google Scholar]
16.Huff C.D., Witherspoon D.J., Simonson T.S., Xing J., Watkins W.S., Zhang Y., Tuohy T.M., Neklason D.W., Burt R.W., Guthery S.L. Maximum-likelihood estimation of recent shared ancestry (ERSA) Genome Res. 2011;21:768–774. doi: 10.1101/gr.115972.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li H., Glusman G., Hu H., Shankaracharya, Caballero J., Hubley R., Witherspoon D., Guthery S.L., Mauldin D.E., Jorde L.B. Relationship estimation from whole-genome sequence data. PLoS Genet. 2014;10:e1004144. doi: 10.1371/journal.pgen.1004144. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Ng C.T., Joe H. Model comparison with composite likelihood information criteria. Bernoulli. 2014;20:1738–1764. [Google Scholar]
19.Morrison J. Characterization and correction of error in genome-wide IBD estimation for samples with population structure. Genet. Epidemiol. 2013;37:635–641. doi: 10.1002/gepi.21737. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Staples J., Nickerson D.A., Below J.E. Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis. Genet. Epidemiol. 2013;37:136–141. doi: 10.1002/gepi.21684. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Gusev A., Lowe J.K., Stoffel M., Daly M.J., Altshuler D., Breslow J.L., Friedman J.M., Pe’er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–326. doi: 10.1101/gr.081398.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Abecasis G.R., Altshuler D., Auton A., Brooks L.D., Durbin R.M., Gibbs R.A., Hurles M.E., McVean G.A., The 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Roach J.C., Glusman G., Smit A.F., Huff C.D., Hubley R., Shannon P.T., Rowen L., Pant K.P., Goodman N., Bamshad M. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328:636–639. doi: 10.1126/science.1186802. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Voight B.F., Pritchard J.K. Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 2005;1:e32. doi: 10.1371/journal.pgen.0010032. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Hu H., Roach J.C., Coon H., Guthery S.L., Voelkerding K.V., Margraf R.L., Durtschi J.D., Tavtigian S.V., Shankaracharya, Wu W. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat. Biotechnol. 2014;32:663–669. doi: 10.1038/nbt.2895. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Browning B.L., Browning S.R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 2011;88:173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S5 and Table S1

mmc1.pdf^{(858.5KB, pdf)}

Document S2. Article plus Supplemental Data

mmc2.pdf^{(1.9MB, pdf)}

[bib1] 1.Bellis M.A., Hughes K., Hughes S., Ashton J.R. Measuring paternal discrepancy and its public health consequences. J. Epidemiol. Community Health. 2005;59:749–754. doi: 10.1136/jech.2005.036517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Kerr S.M., Campbell A., Murphy L., Hayward C., Jackson C., Wain L.V., Tobin M.D., Dominiczak A., Morris A., Smith B.H., Porteous D.J. Pedigree and genotyping quality analyses of over 10,000 DNA samples from the Generation Scotland: Scottish Family Health Study. BMC Med. Genet. 2013;14:38. doi: 10.1186/1471-2350-14-38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Wolf M., Musch J., Enczmann J., Fischer J. Estimating the prevalence of nonpaternity in Germany. Hum. Nat. 2012;23:208–217. doi: 10.1007/s12110-012-9143-y. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Boehnke M., Cox N.J. Accurate inference of relationships in sib-pair linkage studies. Am. J. Hum. Genet. 1997;61:423–429. doi: 10.1086/514862. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Staples J., Qiao D., Cho M.H., Silverman E.K., Nickerson D.A., Below J.E., University of Washington Center for Mendelian Genomics PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. Am. J. Hum. Genet. 2014;95:553–564. doi: 10.1016/j.ajhg.2014.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Palamara P.F., Francioli L.C., Wilton P.R., Genovese G., Gusev A., Finucane H.K., Sankararaman S., Sunyaev S.R., de Bakker P.I., Wakeley J., Genome of the Netherlands Consortium Leveraging Distant Relatedness to Quantify Human Mutation and Gene-Conversion Rates. Am. J. Hum. Genet. 2015;97:775–789. doi: 10.1016/j.ajhg.2015.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Saad M., Wijsman E.M. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet. Epidemiol. 2014;38:579–590. doi: 10.1002/gepi.21844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Saad M., Wijsman E.M. Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. Genet. Epidemiol. 2014;38:1–9. doi: 10.1002/gepi.21776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Wijsman E.M. The role of large pedigrees in an era of high-throughput sequencing. Hum. Genet. 2012;131:1555–1563. doi: 10.1007/s00439-012-1190-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Browning S.R., Browning B.L. Identity by descent between distant relatives: detection and applications. Annu. Rev. Genet. 2012;46:617–633. doi: 10.1146/annurev-genet-110711-155534. [DOI] [PubMed] [Google Scholar]

[bib11] 11.Browning S.R., Browning B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Browning S.R., Thompson E.A. Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics. 2012;190:1521–1531. doi: 10.1534/genetics.111.136937. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.O’Connell J., Gurdasani D., Delaneau O., Pirastu N., Ulivi S., Cocca M., Traglia M., Huang J., Huffman J.E., Rudan I. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10:e1004234. doi: 10.1371/journal.pgen.1004234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Alvarez-Cubero M.J., Saiz M., Martinez-Gonzalez L.J., Alvarez J.C., Eisenberg A.J., Budowle B., Lorente J.A. Genetic identification of missing persons: DNA analysis of human remains and compromised samples. Pathobiology. 2012;79:228–238. doi: 10.1159/000334982. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Lin T.H., Myers E.W., Xing E.P. Interpreting anonymous DNA samples from mass disasters--probabilistic forensic inference using genetic markers. Bioinformatics. 2006;22:e298–e306. doi: 10.1093/bioinformatics/btl200. [DOI] [PubMed] [Google Scholar]

[bib16] 16.Huff C.D., Witherspoon D.J., Simonson T.S., Xing J., Watkins W.S., Zhang Y., Tuohy T.M., Neklason D.W., Burt R.W., Guthery S.L. Maximum-likelihood estimation of recent shared ancestry (ERSA) Genome Res. 2011;21:768–774. doi: 10.1101/gr.115972.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Li H., Glusman G., Hu H., Shankaracharya, Caballero J., Hubley R., Witherspoon D., Guthery S.L., Mauldin D.E., Jorde L.B. Relationship estimation from whole-genome sequence data. PLoS Genet. 2014;10:e1004144. doi: 10.1371/journal.pgen.1004144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Ng C.T., Joe H. Model comparison with composite likelihood information criteria. Bernoulli. 2014;20:1738–1764. [Google Scholar]

[bib19] 19.Morrison J. Characterization and correction of error in genome-wide IBD estimation for samples with population structure. Genet. Epidemiol. 2013;37:635–641. doi: 10.1002/gepi.21737. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Staples J., Nickerson D.A., Below J.E. Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis. Genet. Epidemiol. 2013;37:136–141. doi: 10.1002/gepi.21684. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Gusev A., Lowe J.K., Stoffel M., Daly M.J., Altshuler D., Breslow J.L., Friedman J.M., Pe’er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–326. doi: 10.1101/gr.081398.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Abecasis G.R., Altshuler D., Auton A., Brooks L.D., Durbin R.M., Gibbs R.A., Hurles M.E., McVean G.A., The 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Roach J.C., Glusman G., Smit A.F., Huff C.D., Hubley R., Shannon P.T., Rowen L., Pant K.P., Goodman N., Bamshad M. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328:636–639. doi: 10.1126/science.1186802. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Voight B.F., Pritchard J.K. Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 2005;1:e32. doi: 10.1371/journal.pgen.0010032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Hu H., Roach J.C., Coon H., Guthery S.L., Voelkerding K.V., Margraf R.L., Durtschi J.D., Tavtigian S.V., Shankaracharya, Wu W. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat. Biotechnol. 2014;32:663–669. doi: 10.1038/nbt.2895. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Browning B.L., Browning S.R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 2011;88:173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

PADRE: Pedigree-Aware Distant-Relationship Estimation

Jeffrey Staples

David J Witherspoon

Lynn B Jorde

Deborah A Nickerson

Jennifer E Below

Chad D Huff

Abstract

Introduction