Abstract
Clinical genetic sequencing tests often identify variants of uncertain significance (VUS). One source of data that can help classify the pothogenicity of variants is familial cosegregation analysis. Identifying and genotyping relatives for cosegregation analysis can be time consuming and costly. We propose an algorithm that describes a single measure of expected variant information gain from genotyping a single additional relative in a family. Then we explore the performance of this algorithm by comparing actual recruitment strategies used in 35 families who had pursued cosegregation analysis with synthetic pedigrees of possible testing outcomes if the families had pursued an optimized testing strategy instead. For each actual and synthetic pedigree, we calculated the likelihood ratio of pathogenicity as each successive test was added to the pedigree. We analyzed the differences in cosegregation likelihood ratio over time resulting from actual versus optimized testing approaches. Employing the testing strategy indicated by the algorithm would have led to maximal information more rapidly in 30 of the 35 pedigrees (86%). Many clinical and research laboratories are involved in targeted cosegregation analysis. The algorithm we present can facilitate a data driven approach to optimal relative recruitment and genotyping for cosegregation analysis and more efficient variant classification.
Keywords: Cosegregation Analysis, Variant Classification, Variant of Unknown Significance, Family Analysis, Optimization
Background
Genetic testing and Variants of Uncertain Significance
Genetic testing for inherited disease risk has helped many people reduce and manage their risk. The benefits of genetic testing are clear for autosomal dominant syndromes that dramatically increase the risk of preventable diseases like breast cancer, colon cancer, and coronary artery disease (Centers for Disease Control and Prevention [CDC], 2014). Although genetic testing may be beneficial to those who have a known pathogenic variant, many people receive genetic testing results that report a variant of uncertain significance (VUS). Most VUS are extremely rare variants where there is insufficient clinical data to know if the variant is associated with disease or not (Cooper, 2015; Starita et al., 2017). About 40% of variants listed in ClinVar are VUS (Landrum et al., 2014). In order to make clinical decisions, patients and providers faced with VUS rely on a number of strategies such as disregarding VUS status, assuming pathogenicity, or gathering more family information to estimate the variant’s pathogenicity using statistical approaches (Ranola et al., 2018).
Cosegregation Analysis
One of the most efficient ways to gather information about the clinical significance of a VUS is through family cosegregation analysis (Thompson, 1981; Tsai et al., 2019). Cosegregation likelihood estimation algorithms calculate likelihood of pathogenicity of a specific variant based on the carrier status, distance from proband, and disease status of family members. Familial cosegregation analysis is unique among classification criteria by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMGG-AMP)(Richards et al., 2015) in that it can give evidence supporting either benign or pathogenic classification, and the evidence that it provides is noted to come in varying strengths. Many clinical laboratories are involved in targeted cosegregation analysis, although each may set different criteria for which patients and their relatives qualify for testing and are included in the analysis (Garrett et al., 2016). These criteria include gene in question, specific genetic change in the variant, and size of pedigree (Garrett et al., 2016). Gathering sequencing and phenotyping information from families is a costly and time intensive process that is dependent on laboratory eligibility criteria and the efforts of the patient or their genetic counselors (Shirts et al., 2014; Tsai et al., 2019). Each additional relative recruited and genotyped to a family VUS classification study requires financial and time resources. It is in everyone’s best interests to maximize the information gained per relative. However, there is currently no algorithm to determine the most informative relatives for VUS classification, and no tool currently exists to help patients or providers identify the potential informativeness of testing any particular family member next.
We previously developed the CoSeg R package to implement different cosegregation analysis methods and assist in quantitative cosegregation analysis of family data to help classify VUS (Ranola et al., 2018). When provided with a pedigree including the survival-age and gender of the family members, plus any of available disease and carrier status of the family members, CoSeg functions provide several quantitative cosegregation metrics, including the proband’s cosegregation likelihood ratio (CSLR) (Mohammadi et al., 2009). Use of the platform enables genetic counselors and providers to more easily calculate CSLR as more data become available.
(Thompson, 1981) demonstrated that the optimal strategy for choosing the next relative in linkage analysis, which is in many ways similar to cosegregation analysis, depends on the hypothesis being tested. Thompson sought an optimal strategy agnostic of the inheritance model and family structure. The question we are asking is a more specific question: given a specific pedigree with a specific pattern of known and unknown genotypes and phenotypes, which person would be the best person to genotype next? That individual may or may not actually carry the variant and may or may not reveal genotypes of others as being obligate carriers of the rare variant; thus, the problem is not trivial. By calculating who might be most informative, and testing those people, it may be possible to reduce the number of family members tested and the total costs of classifying the variant.
This study aims to describe the range of outcomes possible when pursuing an optimized strategy for classifying VUS. We define a strategy to calculate the relative informativeness of individuals in any given pedigree and penetrance model. We then present an algorithm that combines all possible outcomes into a single measure of expected variant information gain from genotyping a single additional relative. We then use simulations to demonstrate the difference in performance of such an approach when compared to actual rule-of-thumb strategies to recruit individuals for family studies. Finally, we make the code to perform these calculations available for public use.
Methods
Ethical Approval
Written consent was gathered from participating human subjects. Approval for the study was provided by The University of Washington Institutional Review Board (study number 00000092).
Overview
Cosegregation is calculated using pedigrees that include at least two individuals that have genotype information. Most often the initially tested proband is the only individual with a known genotype, so additional relatives must be recruited and genotyped. We developed an algorithm to quantify which family members are expected to give the most classification information. In order to test the validity and potential utility of this algorithm, we developed novel simulations to create synthetic pedigrees that had the same family structure and VUS as the pedigrees of patients participating in cosegregation analysis. The synthetic pedigrees represented every possible genotype outcome that could have been present in the tested family members. We then calculated the information gain over time that each sequential genotype provided in both real and synthetic pedigrees.
Statistical Algorithm to Calculate Expected Information from Genotyping Each Relative
For ungenotyped persons in a pedigree, expected information for variant classification can be calculated by iterating along each person whose genotype is unknown and running cosegregation analysis, first assuming they are carriers and then again assuming they are non-carriers and combining those values in the following way. For cosegregation, a likelihood ratio or Bayes factors value of 1 indicates no information would be expected from genotyping this person. A value of 10 indicates an average 10-fold change in the cosegregation value. This change can be in either direction, i.e. more benign (lower likelihood of pathogenicity) or more pathogenic (higher likelihood of pathogenicity). A value of 0.1 also indicates a 10-fold change in likelihood. The change in cosegregation results for both a positive or negative result is normalized to a fold-change so that information for and against pathogenicity is considered equal. Then each fold-change value is weighted by the probability that the individual will or will not have the variant based on their relationship to others known to have the variant and these two weighted values are averaged.
This average takes the form:
Example: Suppose an individual has a 25% chance of having a variant. When they are genotyped, if they have the variant, the cosegregation likelihood would go up by 10-fold, but if they do not the cosegregation likelihood would go down by 2-fold. The score would be . The calculation reports the average of the expected change in value given the possible outcomes. If the starting cosegregation likelihood ratio were 2.5 in the example above, testing the individual described would result in a final CSLR of (2.5*10) = 25 or (2.5 * ½) = 1.25. Thus, the tool is not designed to predict the cosegregation value after someone is genotyped; rather, it is designed to prioritize genotyping individuals that are expected to give the most information.
Algorithm Implementation
CoSeg contains an implementation of the CSLR algorithm written in R (Ranola et al., 2018). The CoSeg package contains tools for importing actual family pedigree data, generating synthetic pedigree data, and visualizing the results of CSLR analysis on a pedigree.
We have expanded the CoSeg package to include a function named RankMembers for identifying how informative a family member’s carrier status may be if added to the pedigree. Given a pedigree with at least the proband carrier status identified, the function estimates the amount of information that would be gained when doing cosegregation analysis after genotyping any additional un-genotyped member of the pedigree. Pedigree members with unknown genotype can then be ranked based on the expected information they would provide in a cosegregation analysis.
Algorithm Comparison with Actual Family Recruitment Results
1. Participants
Pedigree data were collected from subjects participating in the UW FindMyVariant Study (Tsai et al., 2019). Family members identified by the patient were offered the opportunity to join the study and have targeted genetic testing to identify if they were a carrier of the same variant as the patient. Along with identifying relatives, patients provided the following information about their relatives: age (or age of death), cancer status, and age of onset if positive for history of cancer. These data were collected from patients and family members, and were the most accurate versions of these pedigrees possible at the time of analysis. Testing occurred as identified relatives were selected for targeted testing and responded to requests by the patient and researchers. Relatives tested were asked to confirm their phenotype information and relationship with the proband (Tsai et al., 2019).
2. Generation of Synthetic Pedigrees
Synthetic pedigrees were generated according to the following algorithm. For each pedigree, time 0 is the initial state, where only the proband’s status is known. The next most informative relative was identified by selecting the relative with the highest average expected information and two pedigrees created representing the T1 state, one where the identified family member is a carrier, and another where they are not. For each of these pedigrees, the process was repeated, so at T2 there would be four pedigrees, and T3, eight, and so on.
For each of our synthetic pedigrees, we limited the maximum number of family members to be tested to be the same as the number of family members tested in the pedigree from which that synthetic pedigree was based. This allowed a fair comparison of predicted optimal testing and actual testing given the number of tests actually done. This also resulted in an upper bound 2(t_max) synthetic pedigrees per original pedigree, where tmax equals the number of family members genotyped after the proband in the actual pedigree.
3. Estimating Informativeness of Newly Genotyped Individuals in Relation of a VUS
For each pedigree, we calculated the likelihood ratio of pathogenicity as each successive test was added to the pedigree. Both information that leads to benign classification or to pathogenic classification are valuable. To express information gain in either direction, we used a measure of entropy from information science (Benish, 1999; Vollmer, 2007). Informativeness was used as a proportional measure of the predictive power of that result, and was calculated as:
Where
We scaled information with the assumption that families would not recruit additional individuals when there was sufficient information to classify a variant as likely pathogenic or likely benign (information was considered complete when there was a 95% probability of a variant being either pathogenic or benign). This informativeness was chosen as our measure of classification certainty in this study.
Results
The RankMembers Function
Our analysis is based on iterating through the newly implemented RankMembers function within the R CoSeg package. RankMembers provides output in two formats, a pedigree chart (Figure 1), and a data object containing the underlying likelihood averages.
Figure 1 :
RankMembers Pedigree Output
Pedigree chart. Each family member is indicated by a circle (female) or square (male) indicating gender, the member id, age of data collection or death, and the average of the likelihood ratios of a positive or negative result expressed as a proportion, if calculable. Members who are already sequenced, or who are not genetic relatives of the proband, have this number replaced by NA.
Pedigrees Included in Analysis
44 patients and their family pedigrees were available for analysis. Six pedigrees could not be analyzed using the CLSR algorithm due to family structures including loops/inbreeding, or pedigree members with children from multiple partners (Mohammadi et al., 2009) or a lack of tested members. For three additional pedigrees, the computation time ran past the length alloted for the study. These three pedigrees were notable for being large (N= 46, 44, and 31), having extensive 2nd and 3rd degree relatives, few 1st degree relatives, and no descendants of the proband. Data from 35 families were included for analysis. The dimensions, affected characteristics, and case characteristics across the remaining pedigrees are provided in Table 1.
Table 1:
Size of Pedigrees Included in Analysis
Statistic across 35 families |
Degree: 1 (N = 35) |
Degree: 2 (N = 33) |
Degree: 3 (N = 30) |
Degree: 4 (N = 19) |
Degree: 5 (N = 9) |
Degree: 6 (N = 1) |
Degree: 7 (N = 1) |
Degree: 8 (N = 1) |
Degree: 9 (N = 1) |
---|---|---|---|---|---|---|---|---|---|
Number of Members mean (SD) | 5.06 ± 1.83 | 8.73 ± 4.77 | 9.60 ± 4.76 | 5.53 ± 4.64 | 2.22 ± 1.20 | 7.00 ± NA | 3.00 ± NA | 4.00 ± NA | 4.00 ± NA |
Number of Confirmed Affected Mean (SD) | 0.83 ± 0.79 | 0.91 ± 1.07 | 1.10 ± 1.45 | 1.21 ± 1.58 | 0.89 ± 1.05 | 1.00 ± NA | 2.00 ± NA | 2.00 ± NA | 3.00 ± NA |
Number of Confirmed Carriers Mean (SD) | 1.43 ± 1.07 | 0.27 ± 0.52 | 0.20 ± 0.48 | 0.05 ± 0.23 | 0.00 ± 0.00 | 0.00 ± NA | 0.00 ± NA | 0.00 ± NA | 1.00 ± NA |
Number of Actual Genetic Tests Per Family
The relationship between reported pedigree size and number of tests (Table 2) that were performed to facilitate variant classification in actual families was tested using Pearson’s product moment, and a positive correlation was found (correlation of 0.59; 95% CI −0.05, 0.89). Thus, though family size serves as an upper limit on the number of tests a family may have, it does not necessarily predict the number of tests, and other factors also influence the probands’ selection of candidates for testing.
Table 2:
Sizes of Pedigrees and Numbers of Tests
Tests Conducted |
N | Mean family size |
Min-max size |
---|---|---|---|
1 | 4 | 32 | 23-47 |
2 | 7 | 22 | 4-38 |
3 | 1 | 39 | 39 |
4 | 9 | 28 | 16-37 |
5 | 3 | 24 | 9-38 |
6 | 1 | 45 | 45 |
7 | 5 | 37 | 25-53 |
9 | 2 | 30 | 18-42 |
10 | 2 | 36.5 | 26-47 |
16 | 1 | 48 | 48-48 |
Performance over Time
Our analysis compared the performance of actual relative sample collection to performance of a testing strategy optimized by the RankMembers algorithm. This simulation of an optimized testing strategy formed our synthetic pedigrees. At each test point, we model the effect on classification information if the patient were a carrier, or not, of the variant. Test 0 (T0) represents the start of study participation, where we only know the proband’s result. T1 would be upon receiving the results from the first relative, T2 one additional relative, and so on. Note that additional genetic tests may decrease, rather than increase, certainty of a diagnosis. This behavior is observable in both our real and our synthetic pedigrees. Because 2t synthetic pedigrees are potentially possible, the pedigrees with most tests generated most of our data.
Figure 2 is an example of output used in our analysis. It has been deconstructed to show how the algorithm progresses over time. The proband is represented at T0, and classification information from family testing is 0 at this point. The first chart shows all possible outcomes after one additional family member is tested. The red line represents the change in classification information that was observed in the actual family given the result actually received. The results shown by the blue lines are for our synthetic pedigrees.The blue lines show the two possible results of change in diagnostic certainty. One line represents the effect on classification information of a positive test result and the other line of a negative test result.
Figure 2:
Deconstructed Sample Chart Used in Analysis
Figure shows classification information through the testing sequence for a family. The Y axis is a measure of clinically actionable information (1- entropy), from 0 to 1 where 1 represents classification as either likely pathogenic or likely benign. The X axis is the number of individuals tested at that point in sequential order. The red line is the classification information over time of a family that participated in the study. The blue lines represent the classification information of possible testing outcomes in simulated (synthetic) pedigrees.
The second chart in Figure 2 shows the effect of adding one additional test result to the prior results. It is additive of the results in the first chart. Again, the red line is the result obtained by the families that participated in the study. For each of our optimized results, another two results are possible so that four synthetic outcomes are indicated at T2. At each test point, a blue line may stop or split, depending on if further results are possible in the specific pedigree given Mendellian inheritance of a unique variant. Examples of this behavior are visible in the third chart of the figure.
This visualization makes it possible to compare the actual performance, in red, to performance against all possible outcomes of an optimized approach, in blue, given the same number of tests and same family structure, but undetermined genetic carrier status for individuals to be tested. The charts used in our analysis follow a similar design but are not deconstructed by time, so that all tests are visible in one series. This is shown in the final chart in Figure 2.
Figure 3 shows how informative our CSLR results are as more genotyping test results are added to the pedigree for all actual and synthetic data. The red lines are the results for actual participants. The blue lines are results for the same pedigrees executing up-to the same number of tests, but following a path optimized by our algorithm.
Figure 3:
Illustration of Rate of Information Gain of Real and Synthetic Pedigrees, All Data
This figure illustrates all information content for all actual (red) and synthetic (blue) pedigrees
Performance of optimal search strategies was defined as there being an optimized search that achieved maximum certainty in the study frame. Pedigrees for whom, within the study frame, a simulated search was the first to achieve the highest result would be considered a pedigree for which the optimized strategy was performant. This included 30 of our 35 observed pedigrees (86%). Examples of representative pedigrees with 4, 7, 9, 10, 16 genetic tests are provided in Figure 4. These pedigrees were selected as representative of our 30 performant pedigrees. Samples of pedigrees with three or fewer tests were not displayed here, but are available in the supplemental materials.
Figure 4:
Examples of Performant Optimized Approaches
Charts showing performance of 5 synthetic pedigrees including 4, 7, 9, 10, and 16 tests. These examples show how our optimized approach outperforms actual testing.
Five pedigrees had more than nine people tested (See Supp. Figure S1). In families with this amount of testing we observe trends towards plateauing of classification certainty. Four pedigrees had only one test observation (See Supp. Figure S2). These are representative of the many real families which have few options for testing.
Predicted optimal strategies do not always find an outperforming solution or find that solution more quickly. There were two families where the predicted optimal strategy reached the same outcome as the actual approach taken, but took several additional tests to get there (see Supp. Figure S3). These may represent scenarios where phenotype data guides genotyping. For example, testing affected relatives may lead to quicker variant classification of pathogenic variants than the optimal predicted strategy; however, it also leads to much slower classification of benign variants and may also lead to misclassification of benign variants (Tsai et al., 2019). In addition, there were three pedigrees where the optimal strategy led to more information at first, but the actual strategy generated more information over time (See Supp. Figure S4).
Discussion
We developed an algorithm to calculate which relative would be most informative to genotype next for VUS classification. The output from our algorithm provides a summary score for unclassified family members. This score can be used to assign priorities for which family member to sequence next. Although we implement this algorithm using the CLSR method for cosegregation analysis (Mohammadi et al., 2009), it could theoretically be implemented using any quantitative method for gathering family classification data, even meiosis counting methods (Jarvik & Browning, 2016).
To evaluate the potential performance of this algorithm against actual practice, we developed a function to model all possible outcomes in a group of available pedigrees. We then compared theoretical optimized performance to actual strategies used for relative recruitment and genotyping for cosegregation analysis. Prior work in the area of pedigree analysis optimization has focused on the modeled pedigrees or providing alternative models of cosegregation analysis itself (Thompson et al., 2003; Thompson, 1981). Our evaluation strategy explored possible outcomes on different pedigree structures given alternately positive and negative test results at each test in an optimized testing strategy. This approach let us qualitatively explore the range of performance that these families could expect if pursing an optimized strategy.
Because testing of family members is often dependent on convenience and closeness to the proband, we expected our optimal strategies to vary significantly from actual testing with potentially little overlap in which family members were tested (Mohammadi et al., 2009). Optimal strategies outperformed actual recruitment strategies 86% of the time. However, even in an optimized approach, sometimes it was not possible to achieve classification with certainty over 95% with a given pedigree. This is not surprising with cosegregation analysis. A patient with a given pedigree, or certain combinations of test results within their pedigree may not be able to achieve actionable classification within the given pedigree or do so within a certain number of tests.
We limited the number of tests in our optimized approach to match the number of tests conducted by actual families. This resulted in four pedigrees with only one test. In these scenarios the algorithm worked, revealing an optimal path, but the practical implications of using it when only one person is available to genotype are minimal.
Likelihood ratio results tended to be, at least initially, unstable. Our analysis identified that large changes in overall pedigree likelihood ratio are possible as more information is collected. This is an innate feature of cosegregation analysis that was observed in both actual and optimized approaches to pedigree testing. This suggests that, in practice, it would be prudent to continually update pedigrees and reassess recruitment priorities over time.
There are some important limitations for both the algorithm and our analysis of it. There are times when the proposed optimization algorithm cannot identify a next person to test. This result could indicate that no further testing should be performed because the most information possible has been extracted from the pedigree. In the current R-code implementation, this also occurred for pedigrees that do not fit CSLR assumptions (Mohammadi et al., 2009). For three families, the computation time of the algorithm extended beyond the scope of the study. Further work is needed to improve the efficiency of our algorithm in order to have it consistently perform quickly enough for clinical practice. In our approach, we simulated possible outcomes of users assuming they perfectly followed the algorithm’s recommendation. However, real-world collection of test samples is subject to exogenous factors such as psychosocial dynamics between family members, time-availability of family members, costs of testing, and health beliefs of participants (Garrett et al., 2016; Tsai et al., 2019). The estimate of informativeness provided may increase or decrease as more information is gained about family members. The fact that genotyping the predicted most informative member may lead to lower variant classification certainty in some situations may provide a surprising and disappointing result for patients if they harbor the expectation that the tool would increase the confidence of their variant classification. Due to interactions between these factors and our method, there is some element of uncertainty concerning outcomes in real clinical settings,and additional testing is needed to better understand how to utilize the tool effectively in clinical contexts. Finally, we used one estimate of cosegregation likelihood ratios, CSLR. Other quantitative cosegregation calculation techniques may perform differently (Ranola et al., 2018).
In spite of these limitations, this study demonstrates that VUS may be classified more quickly and at less expense than we are doing at present. Additionally, our data are representative of an array of pedigree dimensions, suggesting that performance is generalizable to a wide variety of pedigree shapes and sizes, as well as a variety of genes with different penetrance models.
Conclusions
Our analysis presents an algorithm to optimize recruitment for variant classification cosegregation analysis. We have demonstrated performance characteristics and given examples for how the method may perform in practice. Using this tool to illustrate potential downstream outcomes of future genetic testing may improve decision making by patients and providers, particularly where benign variant classification is likely or where sufficiently informative family members may or may not be available.
Decisions about genetic testing are made within a complex ecosystem in which families experience constraints due to family dynamics, finances, and prior test results. In prior work we have shown that gathering information from a relatively small number of individuals can make a substantial difference in variant classification, improving the accuracy of diagnosis (Tsai et al., 2019). However, the process, time, and cost to conduct these tests vary from laboratory to laboratory depending on their enrollment practices and policies (Garrett et al., 2016). Future work could consider the application of the tool in real-world contexts, potentially examining whether genetic counselors and/or patients make different decisions about testing when presented with information about potential outcomes through the use of the methods presented herein. A more advanced tool would be adapted to account for these and be able to provide counselors with forward looking information on the potential effectiveness of various testing strategies and achieving a satisfactory variant classification.
Supplementary Material
Acknowledgments
Funding for this project came in part from a Damon Runyon-Rachleff Innovation Award (DRR-33-15), from the NIH (1R21HG008513), and from a Brotman Baty Institute for Precision Medicine grant.
Footnotes
Conflict of Interest Statement
The authors have no conflict of interest to declare.
Data Availability Statement
The CoSeg package is publicly available via CRAN (https://r-forge.r-project.org/R/?group_id=2174). Pedigrees used as the basis for simulations have been previously published and are available in supplemental files of (Tsai et al., 2019).
References
- Benish WA (1999). Relative entropy as a measure of diagnostic information. Medical Decision Making, 19(2), 202–206. 10.1177/0272989X9901900211 [DOI] [PubMed] [Google Scholar]
- Centers for Disease Control and Prevention [CDC]. (2014). Tier 1 Genomics Applications and their Importance to Public Health. CDC. https://www.cdc.gov/genomics/implementation/toolkit/tier1.htm [Google Scholar]
- Cooper GM (2015). Parlez-vous VUS? Genome Res, 25(10), 1423–1426. 10.1101/gr.190116.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garrett LT, Hickman N, Jacobson A, Bennett RL, Amendola LM, Rosenthal EA, & Shirts BH (2016). Family Studies for Classification of Variants of Uncertain Classification: Current Laboratory Clinical Practice and a New Web-Based Educational Tool. J Genet Couns, 25(6), 1146–1156. 10.1007/s10897-016-9993-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jarvik GP, & Browning BL (2016). Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet, 98(6), 1077–1081. 10.1016/j.ajhg.2016.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, & Maglott DR (2014). ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res, 42(Database issue), D980–985. 10.1093/nar/gkt1113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohammadi L, Vreeswijk MP, Oldenburg R, van den Ouweland A, Oosterwijk JC, van der Hout AH, Hoogerbrugge N, Ligtenberg M, Ausems MG, van der Luijt RB, Dommering CJ, Gille JJ, Verhoef S, Hogervorst FB, van Os TA, Gomez Garcia E, Blok MJ, Wijnen JT, Helmer Q, Devilee P, van Asperen CJ, & van Houwelingen HC (2009). A simple method for co-segregation analysis to evaluate the pathogenicity of unclassified variants; BRCA1 and BRCA2 as an example. BMC Cancer, 9(1), 211. 10.1186/1471-2407-9-211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ranola JMO, Liu Q, Rosenthal EA, & Shirts BH (2018). A comparison of cosegregation analysis methods for the clinical setting. Fam Cancer, 17(2), 295–302. 10.1007/s10689-017-0017-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, Voelkerding K, Rehm HL, & Committee ALQA (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med, 17(5), 405–424. 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shirts BH, Jacobson A, Jarvik GP, & Browning BL (2014). Large numbers of individuals are required to classify and define risk for rare variants in known cancer risk genes. Genet Med, 16(7), 529–534. 10.1038/gim.2013.187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Starita LM, Ahituv N, Dunham MJ, Kitzman JO, Roth FP, Seelig G, Shendure J, & Fowler DM (2017). Variant Interpretation: Functional Assays to the Rescue. Am J Hum Genet, 101(3), 315–325. 10.1016/j.ajhg.2017.07.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson D, Easton DF, & Goldgar DE (2003). A full-likelihood method for the evaluation of causality of sequence variants from family data. Am J Hum Genet, 73(3), 652–655. 10.1086/378100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson EA (1981). Optimal sampling for pedigree analysis: relatives of affected probands. Am J Hum Genet, 33(6), 968–977. https://www.ncbi.nlm.nih.gov/pubmed/7325160 [PMC free article] [PubMed] [Google Scholar]
- Tsai GJ, Ranola JMO, Smith C, Garrett LT, Bergquist T, Casadei S, Bowen DJ, & Shirts BH (2019). Outcomes of 92 patient-driven family studies for reclassification of variants of uncertain significance. Genet Med, 21(6), 1435–1442. 10.1038/s41436-018-0335-7 [DOI] [PubMed] [Google Scholar]
- Vollmer RT (2007). Entropy and information content of laboratory test results. Am J Clin Pathol, 127(1), 60–65. 10.1309/H1F0WQW44F157XDU [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The CoSeg package is publicly available via CRAN (https://r-forge.r-project.org/R/?group_id=2174). Pedigrees used as the basis for simulations have been previously published and are available in supplemental files of (Tsai et al., 2019).