Abstract
Mutagenesis-based screens in mice are a powerful discovery platform to identify novel genes or gene functions associated with disease phenotypes. An N-ethyl-N-nitrosourea (ENU) mutagenesis screen induces single nucleotide variants randomly in the mouse genome. Subsequent phenotyping of mutant and wildtype mice enables the identification of mutated pathways resulting in phenotypes associated with a particular ENU lesion. This unbiased approach to gene discovery conducts the phenotyping with no prior knowledge of the functional mutations. Before the advent of affordable next generation sequencing (NGS), ENU variant identification was a limiting step in gene characterization, akin to ‘finding a needle in a haystack’. The emergence of a reliable reference genome alongside advances in NGS has propelled ENU mutation discovery from an arduous, time-consuming exercise to an effective and rapid form of mutation discovery. This has permitted large mouse facilities worldwide to use ENU for novel mutation discovery in a high-throughput manner, helping to accelerate basic science at the mechanistic level. Here, we describe three different strategies used to identify ENU variants from NGS data and some of the subsequent steps for mutation characterisation.
Introduction
Forward genetic screens have been successful in identifying and functionally characterising hundreds of disease-related genes in mice (Acevedo-Arozena et al. 2008; Bull et al. 2013; Potter et al. 2015; Wang et al. 2015). This approach typically uses a DNA damaging agent such as N-ethyl-N-nitrosourea (ENU) to mutagenize male (G0) mice thus inducing random point mutations throughout the germline. Subsequent phenotyping screens on the progeny of these mice are used to identify mice with phenotypes that can mimic human disease and highlight key pathways. The random nature of this approach (no particular gene is targeted) means that novel causative genes can be discovered with no prior annotation required. The mouse is 99 % homologous to humans making it an ideal model organism to study human disease (Mouse Genome Sequencing et al. 2002). The mouse reference—C57BL/6J—was originally sequenced in 2001; since then multiple updates to the assembly have rendered the reference a stable and reliable background to identify sequence variations (Church et al. 2009). This was and is imperative to identifying ENU mutations because detection traditionally involves identifying the mutagenized ENU region of interest via polymorphic markers. This traditional process has been fruitful in the past but requires fine mapping of the candidate region and exon-by-exon sequencing. This was slow, labour intensive and involved making assumptions about the underlying genetic cause of the observed phenotype. With the advancement of next generation sequencing (NGS), whole exome or genome sequence can be produced in a matter of weeks rather than years and new analysis techniques based on this data are rapidly reducing mutation identification time and increasing mutation characterisation analysis. Here, we explore the current and innovative strategies used to identify ENU mutations via NGS, their correlation to human disease and its impact on mouse genetics.
Next generation sequencing
Whole genome versus whole exome sequencing
There are many different NGS platforms ranging from those generating billions of short sequence reads of ~100 bp (Illumina), to those generating reads of >1000 bp, to those sequencing a single molecule. The comparison of these technologies is covered in other reviews (Quail et al. 2012; Mardis 2013). Early application of NGS undertook a ‘targeting’ approach where candidate regions resulting from positional mapping would be deep-sequenced in order to find the causative ENU lesion (Kurapati et al. 2012). Due to the reduction in sequencing cost, whole exome and whole genome approaches are becoming a mainstay for discovering novel mutations in mouse or human populations.
Whole exome sequencing (WES) typically refers to sequencing every protein-coding exon in the genome. It may also be extended to user-specific loci and non-coding regions including; micro-RNAs, lincRNAs, etc. DNA libraries containing targeted exons from genes are usually governed by gene sets from reputable resources such as the consensus coding sequence (CCDS) database and the RefSeq database (Pruitt et al. 2009, 2014). As the exome represents approximately 1.5 % of the genome (Lander et al. 2001), significantly higher sequence coverage can be achieved with WES compared to whole genome sequencing (WGS). For example, ~90 Gb of sequence data is required to achieve a 30× average coverage of the whole genome whereas only 3 Gb of sequence data is required for a 75× average coverage of the whole exome (Voelkerding et al. 2009; Bainbridge et al. 2010). Deeper sequence coverage is a clear advantage of exome sequencing as sequence depth is directly correlated with the sequence quality of a single nucleotide variation (SNV). However, coverage is more uneven with WES than WGS due to biases in targeted capture, hence higher mean coverage depths are required to detect coding variants and some regions remain consistently difficult to capture (Sims et al. 2014). For example, a recent study comparing the human Gencode annotation with current exon arrays found 5594 genes missing from the array geneset and inaccessible to WES (Coffey et al. 2011). NGS technologies have higher error rates than Sanger Sequencing, leading to increased false positives in mutation detection (Kircher and Kelso 2010; Ledergerber and Dessimoz 2011). This is somewhat offset when sequencing depth is increased; however, systematic biases will persist. Large-scale initiatives using WES to detect spontaneous mouse mutations and ENU-induced mutations have shown a good success rate (~40–75 %) for novel mutation detection (Boles et al. 2009; Fairfield et al. 2011). However, WES is reliant on gene annotations from databases that will not contain undiscovered exons or regulatory sequences such as enhancers or promoters, areas increasingly recognised as important in disease. Moreover, larger sequence variations such as structural variations (e.g. large insertions, deletions or translocations, etc.) that span exon boundaries will remain undetected. Previous ENU studies detected the majority of ENU-induced mutations in coding exons (Nolan et al. 2000; Quwailid et al. 2004); therefore, there is a preference for deeper sequencing using exome sequencing. There is likely to be an ascertainment bias in the past ENU literature due to difficulty in identifying non-coding variants (e.g. found in repetitious regions with limited functional annotation). However, interpretation of these regions is becoming a more tractable problem with resources to predict function in non-coding regions (Stamatoyannopoulos 2012) and WGS will make it easier to detect these mutations.
General NGS pipeline
Sequence analysis to discover ENU mutations requires three basic steps: (i) alignment to a reference genome, (ii) variant detection and (iii) variant annotation. This pipeline usually occurs in an automated manner prior or in tandem with the isolation of the ENU causative mutation. This review will mostly concentrate on the specific detection of novel or ENU-induced mutations alongside characterisation as part of the second and third step. Briefly, mouse mutant sequence data are usually aligned to the reference (mm10) using a popular aligner (e.g. BWA, Maq). The alignment is the foundation for accurate mutation detection and is critical to identifying all possible variants. Currently a good alignment maps ~98 % of the reads with default parameters (e.g. usually two mismatches in the seed sequence). There are a plethora of widely used variant callers, including SAMtools (Li et al. 2009), Unified Genotyper in the Genome Analysis Toolkit (GATK)(DePristo et al. 2011), Platypus (Rimmer et al. 2014), etc. Typically variant calling involves two steps: genotype assessment and variant identification, both steps vary between different callers. Even though many variants will be common between the different callers, mutation detection should be carried out with one or more mutation detection tools to minimise false positives. There are many reviews on the different types of variant callers (Liu et al. 2013; Pirooznia et al. 2014). Lastly, annotating sequencing variants in terms of genomic position, functional context and potential clinical impact has become an essential part of sequence variant analysis. ENU NGS pipelines typically determine the genomic annotation of a SNV; intronic, exonic, missense, nonsense, splice site, regulatory region, etc. Three popular tools for variant annotation are ANNOVAR (Wang et al. 2010), NGS-SNP (Grant et al. 2011) and Variant Effect Predictor (McLaren et al. 2010). The impact of a sequence variant on the genome and phenotype is briefly discussed below. To our knowledge, relating a sequence variant directly to the phenotype is not yet standardised and would be challenge to the bioinformatic field.
As NGS technologies and detection of novel mutations in ENU-induced mice become commonplace, the requirement to streamline the mutation detection process to ensure cost efficiency has increased. Different mouse breeding schemes and the mutation detection methods developed are discussed below.
ENU breeding and background
A variety of strains have been used, in a range of phenotype-driven screens, which have been reviewed in detail elsewhere (Acevedo-Arozena et al. 2008; Andrews et al. 2012; Wang et al. 2015). The most commonly used background is C57BL6/J, because this strain retains fertility at higher doses of ENU (Justice et al. 2000) and the number of mutations induced is proportional to the dose of ENU (Russell et al. 1982). A variety of breeding strategies can be employed reviewed below and in Acevedo-Arozena et al. 2008. Firstly, the simple outcross scheme, which enables the rapid identification of a map location; and secondly the inbred scheme, which relies on sequencing to map mutations, increasing the number of mutations present in G3 mice by breeding from two G0 mice. The main advantage of carrying out phenotypic screens on an inbred background is reduced variation in the data produced. Differences between strains in certain phenotypes result in greater variation in the baseline data, making detection of subtle phenotypes on a mixed genetic background more difficult and often requiring more mice to confirm a phenotype. For example, there is a significantly lower bone mineral density in C57BL/6J mice when compared to most other strains (Simon et al. 2013). This variance can however lead to the identification of phenotypic modifiers which may or may not be advantageous to the screen. Additionally certain inbred strains may be employed because of their susceptibility or resistance to certain phenotypes (Jonczyk et al. 2014; Banks et al. 2015).
A variety of breeding strategies have been utilised to maximise the number of mutations in the progeny that undergo screening. As long as a phenotype is detectable, and is amenable to relatively high-throughput screening, forward genetic screens can be used as a discovery platform to identify genes and pathways associated with a disease or pathway. A wide range of screens have been applied; from developmental processes, ex vivo and in vivo analysis of immune function (Andrews et al. 2012; Wang et al. 2015), through basic physiological functions (Hrabe de Angelis et al. 2000; Acevedo-Arozena et al. 2008) to more complex behavioural phenotypes (Nolan et al. 2000). Challenges can be applied to mouse phenotyping pipelines to discover novel gene function and screens have revealed modifiers of phenotypes or indeed disease progression (Vinuesa and Goodnow 2004; Buchovecky et al. 2013).
Coupled with the increased efforts of the more sophisticated phenotyping pipelines (Brown and Moore 2012) are the new and innovative ways to detect mutations using NGS, ranging from large structural variants to small insertions and deletions (indels) to single nucleotide polymorphisms (SNPs). ENU mutations are typically SNVs and to a lesser extent, small indels. Since the emergence of NGS there has been an evolution of ENU mutation detection strategies, making ENU an efficient and attractive method to generate mouse models of human disease (Andrews et al. 2012; Potter et al. 2015).
Methods for mutation mapping and detection
Method 1: candidate region approach
Whilst several phenotype-driven ENU screens have been run or are still underway, to our knowledge, the Harwell Ageing Screen is the first to apply whole genome sequencing in a high-throughput, unbiased approach to discover genetic lesions that result in a detectable phenotype. The two mouse strains that are used by MRC Harwell to generate mutant mouse lines are C57BL/6J and C3H/HeH. Initially, male mice are injected intraperitoneally with ENU doses of 1 × 120 mg/kg, and then 2 × 100 mg/kg with a week between each dose. These mutagenised male mice (G0) are then mated with wild type females to give mice that are heterozygous for every ENU-induced mutation (G1). These can be subjected to phenotype-driven screening programs, with the intent of discovering dominant mutations, or further breeding can be carried out to generate homozygous mutant mice (G3) to identify recessive mutations resulting in phenotypes. The Harwell Ageing Screen has opted to sequence the G1 mouse in order to detect all of the ENU-induced ENU mutations contained within a pedigree. In parallel to G1 sequencing, G3 phenotyping is carried out. Once a phenotype of interest is identified (e.g. >3 mice are phenodeviant at any one timepoint) the affected G3 mice undergo positional mapping. Positional mapping aims to identify the recombinant mapping region(s) containing the causative ENU lesion (Fig. 1). Typically the breeding scheme will include a highly polymorphic background strain to provide polymorphic genetic markers flanking the ENU lesion. The interval size is characterised by the density of polymorphic markers alongside the number of recombination events. Figure 2 shows an ENU region in the genome flanked with polymorphic markers. Once the candidate region in the G3s is narrowed to a manageable size (this can be anything ranging from ~30 Mb to the whole chromosome), all coding and non-coding variants in the respective G1 loci are identified in the WGS mutation detection pipeline. The NGS and mutation detection pipeline used at Harwell involves mapping sequence reads to the mouse reference (currently mm10) and calling SNVs using an established SNV caller such as GATK or SAMTOOLs. Subsequent prioritisation of the variants occurs (discussed below) and the G3s are genotyped for the chosen variants to confirm inheritance of the putative causative mutation. This ‘drill down’ approach allows for the rapid discovery of multiple causative ENU mutations in a pedigree when only sequencing one mouse, whilst also generating a library of potentially functional mutations available for a gene-driven approach in the G1 archive (Quwailid et al. 2004). The main challenge of mutation detection is distinguishing genuine ENU lesions from the background noise resulting from nucleotide errors in the sequence reads. Over the years a number of typical steps have been employed to remove the false positives. These steps include one or more of the following: a read depth threshold where variants found in less than the allotted number of reads are ignored, a quality threshold where variants in poorly mapped reads are ignored and inbred SNP identification where variants overlapping background SNV sites are ignored (Simon et al. 2012). This prioritisation and filtering of SNVs is a crucial step in the NGS pipeline as false discovery of erroneous SNVs masquerading as real ENU variants can result in incorrect candidate genes, whereas over-filtering can result in the exclusion of the real causal mutation, resulting in the failure of the experiment.
To date, Harwell has used this NGS pipeline and mutation detection strategy on >70 mouse genomes including 44 genomes, both G3 and G1 for the Harwell Ageing Screen. Harwell found coding ENU mutations (missense, splice and nonsense) in the candidate ENU regions of 41 of the 44 genomes. Further characterisations of these mutations are underway including inheritance testing, secondary phenotype testing and molecular examinations.
Method 2: rapid causative mutation finding without use of an outcross
Method 1 represents an early adoption of NGS for ENU mutation detection which relied on outcrossing and coarse mapping (Arnold et al. 2011; Fairfield et al. 2011; Leshchiner et al. 2012; Sun et al. 2012). A more efficient method to rapidly isolate causative ENU mutations should avoid outcrossing, be quick and cost effective, reliable and comprehensive.
Bull et al. published the first method to eliminate outcrossing to a second inbred strain or additional breeding steps after G3, using an identity by descent (IBD)-based approach that infers shared genomic intervals across mice within a pedigree and simultaneously isolates causative ENU mutations (Bull et al. 2013). The method is based on low coverage whole genome sequencing of multiple phenotypically affected mice, and an implementation of the Lander–Green algorithm (Rabiner 1989). The algorithm harnesses knowledge of the pedigree structure to infer the inheritance of founder genotypes. In contrast, methods that simply search for shared mutations will pick up false positives due to shared sequencing errors. They found that excluding shared variants outside of shared genomic intervals removes 75 % of putative shared mutations. Further modelling and empirical data shows that one or two candidate causative ENU mutations can be isolated based on sequencing 3 G3 mice for a recessive trait or 6 G3s for a dominant trait (Fig. 3).
Fine mapping of regions inherited from an ENU ancestor is achieved based on the density of variation, despite the scarcity of ENU variants across the inbred C57B6 genome, using whole genome rather than whole exome sequencing. The depth of coverage in shared genomic intervals is the sum of the depth across all sequenced mice, and the method uses local genotype context to isolate a causative mutation. Therefore, the actual coverage depth per mouse can be very low; in this method all affected individuals from a pedigree are sequenced on one lane of an Illumina Hiseq machine; achieving 12–15 fold combined coverage across the causative variant locus. Bull et al. found this was sufficient to reliably call a homozygous or heterozygous point mutation, since WGS has less variability in depth of coverage than WES (Sims et al. 2014).
The current technique applies WGS to affected G3 individuals within a pedigree; therefore, the delay between identifying a phenotype of interest and isolating the mutation is the sum of the time to run the sequencing (typically 1–2 weeks), the time ‘queuing’ for a sequencing run, which varies between institutions plus the time to run the NGS pipeline. Whilst this is a significant improvement over earlier methods that relied on outcrossing and further breeding beyond G3 for mapping, an approach that generates genotyping data in parallel with phenotyping pipelines, as described by the Beutler group below, avoids this delay altogether. As the costs of WGS continue to fall, it will become feasible to apply WGS to all mice within the pedigree in parallel to phenotyping, rapidly generating a rich database linking phenotype and genotype across coding and non-coding regions.
Method 3: real time identification of ENU-induced mutations in mice
The above methods use massively parallel sequencing of whole mouse genomes or exomes and have arguably exposed genetic mapping as the rate-limiting step in forward genetics. Most ENU-induced mutations are easily found (Andrews et al. 2012); however, finding the causative mutation has remained a time-consuming task. Light sequencing of bar-coded samples from G3 mice for the purpose of genotyping remains a fairly costly proposition, and is usually applied post facto only to pedigrees that display a phenotype (Bull et al. 2013). This means that finding causative mutations is not truly a real-time process, and also precludes the systematic exoneration of non-causative mutations from the screen as a whole.
The Beutler lab developed an alternative approach that permits declaration of causative mutations concurrent with phenotypic screening (Wang et al. 2015), without a requirement for outcrossing and backcrossing or intercrossing as practiced in mapping based on meiotic recombination. Their approach combines exome sequencing and high-throughput genotyping to determine zygosity at all mutation sites in all G3 mice before phenotypic data are acquired, and uses automated computational mapping to assign causality in real time (for overview see Fig. 1). Mice are bred to produce 30–50 G3 mice per pedigree, a number sufficient to detect concordance between traits of moderate strength and homozygosity at a particular locus, assuming a neutral effect on viability. A single G1 male serves as the founder for each pedigree, and is subjected to whole exome sequencing to identify all possible mutations transmitted to G3 mice. Prior to phenotypic screening, the zygosity of these mutations is determined by genotyping G2 and G3 mice and data are uploaded to the Mutagenetix database to await linkage analysis together with phenotypic data. All 30–50 G3 mice are screened in a single experiment on the same day; with the exception of visible phenotypes (affecting, for example, coat colour or behaviour), phenotypic data are quantitative in nature.
Automated linkage analysis is performed by two software programs; Linkage Analyzer and Linkage Explorer, they are based on classical principles of genetic mapping. That is, correlation is determined between genotypes at mutated loci and the presence or absence of a qualitative phenotype, or the magnitude of a quantitative phenotype, with reference to recessive, additive (semi-dominant), or dominant models of inheritance. This determination is made for each mutation site in all mice in a pedigree. The assessment of linkage depends on the probability of association between genotype and phenotype as calculated using a likelihood ratio test from a linear regression model (Wang et al. 2015). With this method, phenovariance is ascertained computationally, thereby eliminating the need for the researcher to designate mice as affected or non-affected.
Linkage Analyzer, the core mapping program, calculates probabilities of association between genotype and phenotype for every mutation subjected to every screen using recessive, additive and dominant transmission models. It detects associations with quantitative and qualitative traits and with lethal effects when homozygosity is significantly under-represented among G3 mice in a pedigree. Additionally, the program identifies complex linkage for phenotypes that depend on two unlinked mutations in any combination of zygosities. Over time, multiple variant alleles of most genes are tested phenotypically, and Linkage Analyzer can combine pedigrees with identical or non-identical allelic mutations to make “superpedigrees.” These are analysed as single pedigrees for genotype–phenotype associations including linkage to lethality.
P values for non-linkage calculated by Linkage Analyzer are tabulated and presented by Linkage Explorer in an online format with one-click access to Manhattan plots for each phenotype and inheritance mode, and from there direct links lead to scatter plots of phenotypic data graphed versus genotype for every variant allele (Fig. 4). A key feature of Linkage Explorer is the ability to narrow or expand the list of positive associations by varying the stringency of criteria for linkage, and by targeting analyses to specific genes, phenotypes, pedigrees and mutation types or effects (Table 1). The nature of each mutation, PolyPhen-2 score, and its effect at the protein and gene levels are also accessed with a single click in Linkage Explorer.
Table 1.
Parameter | Notes |
---|---|
Single or double locus analysis | |
Gene | Will return all phenotypes linked to mutations of the specified gene(s), along with associated P values |
Phenotypic screen | When specified, will return mutations linked to the phenotype(s) tested in the specified screen(s) |
Pedigree or mouse/mice | Will return all genotype–phenotype associations identified in the specified pedigree or the pedigree of which the specified mouse (mice) is (are) part, along with associated P values. Named according to eartag of G1 male founder |
Total mouse numbers | Will restrict linkage analysis to pedigrees containing a specified range or number of G3 mice |
Allele name (phenotype) | Will return all mutations linked to the specified phenotype, along with associated P values |
Mutation type | Will restrict linkage analysis to the specified mutation type(s): nonsense, missense, makesense, critical splicing, noncritical splicing |
Predicted effect of mutation | Will restrict linkage analysis to the specified mutation effect: probably null (corresponds to nonsense and critical splicing mutations); or probably damaging, possibly damaging, probably benign as determined by PolyPhen-2 |
P value cutoff | Will display genotype–phenotype associations with P (non-linkage) ≤ the value specified; Bonferroni correction may be applied |
Minimum number of HET or VAR mice screened | Will return genotype–phenotype associations tested with at least the specified number of HET (heterozygous) or VAR (homozygous mutant) mice |
‘Raw + Norm’ switch | When applied, enforces P value cutoff for both raw and normalized datasets. Otherwise, enforces P value cutoff for either raw or normalized datasets |
Direction of phenovariance | Quantitative phenotype scores either higher than or lower than wild type scores |
Number of linkage peaks | Will return genotype–phenotype associations for which a specified number of linkage peaks exceed the specified −log10[P(non-linkage)] in the Manhattan plot for recessive, dominant or additive models of linkage. This parameter is useful for filtering results to show only strong, unambiguous genotype–phenotype associations |
Date of data collection |
The speed of mapping by Linkage Analyzer now exceeds the rate of production and screening of G3 mice, and linkage assignment occurs within minutes of the entry of phenotypic data to the database. There are several other advantages to this approach. Mapping of quantitative low penetrance and weak phenotypes, which may be difficult to assign to affected vs. non-affected groups, is made possible by the statistical determination of phenovariance and by superpedigree analysis, which increases the power to detect linkage by enlarging the mapping population. Complex traits dependent on two loci can be solved in pedigrees of sufficient size. Moreover, because all mutations in a pedigree are known, not only causative mutations but non-causative mutations (constrained by a specified P value) can be declared. This approach also permits the measurement of saturation, with an upper limit set by the number of genes tested in homozygous state with “probably damaging” missense or null alleles, and a lower limit set by the number of genes with null alleles. As for other mapping strategies described in this review, the limitations of exome capture and massively parallel sequencing apply to our approach. In addition, although the majority of ENU-induced phenotypes have been shown to arise from mutations in coding sequence (Fairfield et al. 2011; Arnold et al. 2012), it remains possible that causative intronic mutations would on rare occasions be missed or attributed to closely linked exonic mutations. Routine CRISPR/Cas9 targeting of implicated genes is therefore used to confirm mapping data.
To date, the Beutler lab has used Linkage Analyzer and Linkage Explorer to test a total of 53,966 mutations in 16,350 genes for their ability to cause phenovariance in 135 screens of immunological function. The mutations were distributed within 22,421 G3 mice from 876 pedigrees. Linkage Analyzer is freely available for download and online data analysis of selected pedigrees via the Mutagenetix website (https://mutagenetix.utsouthwestern.edu/linkage_analysis/linkage_analysis.cfm).
Mutation annotation and consequence
Sequence variation validation typically involves four steps: (i) confirmation of linkage by genotyping, (ii) secondary phenotyping, (iii) cloning the mutation and (iv) producing an alternate allele to confirm the causative allele. With the information generated by NGS, the confirmation of phenotype association with a novel gene is not such a stringent requirement for the confirmation of association, as there is little doubt over whether a second, unidentified allele is associated with a particular phenotype, as was the case with candidate gene sequencing strategies. Furthermore, the advent of CRISP/Cas9 technologies and the easy availability of KO lines (Koscielny et al. 2014) is a great boon to confirmation of a functional link between a novel allele or gene and a phenotype. Alongside ENU validation is usually the in silico examination of the mutation consequence, its influence on the phenotype and association to human disease. ENU-induced mutations provide a full range of alleles including null (loss of function), hypomorphic (reduced function), hypermorphic (gain of function) and neomorphic (novel function); and better model the genetic variation found in the human genome. Moreover, these mutations can reveal gene functions that would not have been discovered through the analysis of null alleles alone (Qian et al. 2011). The coding causative variants are usually classified based on their functional consequence to the genomic sequence; namely missense, nonsense, synonymous and splice site mutations. Nonsense and splice site disruptive SNVs are thought to cause loss of function mutations, while missense mutations can be damaging or tolerant to the protein structure and function (Khurana et al. 2013). The current major challenge in analysing genetic variants is in interpreting the functional affect a mutation has on the gene and/or genome.
A variety of methods are available online to predict the functional effects of SNVs. These methods can be classified into different categories, based on the algorithms implemented for prediction (Table 2). Multiple sequence alignment-based tools implement information on amino acid conservation among homolog protein sequences at particular loci (Ng and Henikoff 2003; Reva et al. 2011). Other tools implement sequence data alongside three-dimensional structure to predict the functional impact of the amino acid on the protein. Tools which combine functional annotation alongside structural data arguably give the best indication of severity. For example, Mutation Taster combines information from different data sources including evolutionary conservation, splice site changes and expression data and PolyPhen2 uses a naïve Bayes classifier which implements eleven features, of which eight are sequence-based while three are structure-based (Adzhubei et al. 2010; Schwarz et al. 2014). Currently there are 4897 solved distinct protein structures, a limiting factor when assessing mutational consequence; therefore, most predictions involve only a local structure alignment. As protein structure information increases the accuracy of SNV functional predictions will also increase. This information will not only impact the SNV role in protein structure but also the mutation’s role in protein–protein interactions and post-translational modifications (Ren et al. 2010; Wendl et al. 2011; De Baets et al. 2012). In some cases, information on the SNV-containing protein domain alongside prior knowledge of protein–protein interactions will be sufficient to determine some affects the mutation has on the pathology of disease.
Table 2.
Tool | URL | Notes | Organism | Reference |
---|---|---|---|---|
Conservation | ||||
SiFT | http://sift.jcvi.org/ | Predicts effect of SNVs | Human and known mouse SNPs (dbSNP) | (Kumar et al. 2009) |
MutationAssessor | http://mutationassessor.org | Predicts effect of SNVs | Human data: cancer studies | (Reva et al. 2011) |
Provean | http://provean.jcvi.org/ | Predicts effect of SNVs, insertions and deletions | Organism independent | (Choi et al. 2012) |
Structure | ||||
SNPs3D | http://www.snps3d.org/ | Predictions based on sequence, 3-D structure, biological networks | Human, useful for association studies | (Yue et al. 2006) |
Machine learning/multiple datasets | ||||
Polyphen-2 | http://genetics.bwh.harvard.edu/pph2/ | Implements MSA, amino acid changes, evolutionary conservation, SNV site hypermutability. Uses a naïve Bayes classifier | Human, can be adapted for mouse genome (standalone) | (Adzhubei et al. 2010) |
MutationTaster2 | http://www.mutationtaster.org/ | Machine learning on evolutionary conservation, splice site changes, gene expression and protein features. Uses a Bayes classifier | Human, uses 1000G data | (Schwarz et al. 2014) |
SNAP | https://www.rostlab.org/services/snap/ | Uses neural networks for evolutionary conservation, secondary structure, solvent accessibility | Human | (Bromberg and Rost 2007) |
Site Directed Mutator (SDM) | http://mordred.bioc.cam.ac.uk/~sdm/sdm.php | Uses a potential free energy function for protein stability; algorithm uses environment-specific substitution tables to calculate stability, predicts disease association | Organism independent | (Worth et al. 2011) |
Post-translational modifications | ||||
PhosSNP | http://phossnp.biocuckoo.org/ | Predicts SNV effect on PTM | Human | (Ren et al. 2010) |
SNPeffect | http://snpeffect.switchlab.org/ | Predicts SNV effect on PTM, structural features of proteins, subcellular localization and interactions | Human | (De Baets et al. 2012) |
Protein–protein interactions | ||||
MuSiC | http://gmt.genome.wustl.edu/packages/genome-music/ | Predicts SNV effect on pathways (Cancer studies). To segregate passenger mutations from truly significant mutations | Human | (Dees et al. 2012) |
MSA multiple sequence alignment, PTM post-translational modifications
The success of phenotype-driven screens in detecting mutants that inform us about biological function is not in doubt but to date, the vast majority of such mutations that have been detected affect coding regions, with a minority being identified as occurring in non-coding regions (Lewis et al. 1991; Masuya et al. 2007). This, it could be argued, is due to a sampling bias as only coding and splice regions have been examined in the majority of programmes who employed a candidate gene approach or NGS technologies (Quwailid et al. 2004; Acevedo-Arozena et al. 2008; Andrews et al. 2012; Wang et al. 2015). The debate on the functional contribution of non-coding DNA continues (Consortium 2012; Eddy 2012; Doolittle 2013) but MRC Harwell’s data presents one of the first unbiased high-throughput examination of the link between phenotype and genotype on a stable genetic background in a mammalian physiology thus enabling us to begin to explore the contribution of non-coding DNA to phenotype. Despite the majority (~97.5 %) of randomly induced mutations being detected in non-coding regions, the overwhelming majority of phenotypes identified (41/44) can be assigned to protein changes. This does seem to suggest that the majority of ‘function’, where changing the sequence results in a detectable phenotypic change, is associated with the gene. However, there are caveats; the phenotypic interrogation of the mutant pipeline of mice is not exhaustive and cannot detect every possible phenotype. It is, however, an unbiased approach as the phenotypes detected undergoes mapping and then sequencing with no assumption of the underlying genetic lesion. It may be that non-coding DNA is more tolerant of sequence changes and is thus under-represented. As more phenotyping and whole genome sequencing is undertaken we will provide further information about the links between sequence and phenotype, particularly concerning the contribution of non-coding DNA to phenotype but these initial results provide a tantalizing glimpse into the functional analysis of DNA and seems to fit with current hypotheses (Palazzo and Gregory 2014). These results will have a significant impact on the search for causative alleles using deep sequencing of patients, suggesting that the current technique of primarily using next generation sequencing will indeed find the majority of causative alleles.
Human correlation
A key goal in understanding human disease and gene dysregulation is to discover and interpret all the genetic variations that can occur in the human population. Advances in sequencing technology and related tools have made it feasible to sequence many human genomes and catalogue all the possible variations. The 1000 Genomes Project, started in 2008, aimed to identify 95 % of the variants that occur in ~1 % of the population and evaluate the feasibility of large-scale sequencing to capture true variants or artefacts (Genomes Project et al. 2010). The project has provided a catalogue of low to high frequency variants which are already starting to support the development of genotyping products as well as a list of background variants to aid the identification of disease-causing and non-disease-causing variants. In parallel, GWAS has become a valuable tool for discovering common variants linked to disease. It is becoming clear that GWAS and other human studies will have considerable effect on human health, especially as independent studies are starting to report the same genes or variants associated with particular diseases (Abad-Grau et al. 2012). GWAS is increasing our understanding of the genetic etiologies underlying all types of diseases ranging from common to complex etiologies. Some reports imply some human diseases are not solely caused by a single variant but rather a combination of multiple common variants exerting a weak affect alongside more severe or stronger effect variants (Visscher et al. 2012). While others find human diseases are associated with multiple variants acting in unison where each variant lies within a single Mendelian disease-causing loci and has the potential to be deleterious in their own right (Blair et al. 2013). With the methods outlied above we have the opportunity with sequencing and advanced phenotyping strategies to correlate ENU mutations with human disease more effectively, rapidly and accurately. Key advantages of the phenotype-driven approach in mice are the number of mutations that can be induced, the range of phenotyping that can be carried out from birth, and the enhanced ability to discover novelty. Human-based studies still rely heavily on published data, and proving a novel function for a gene or the association of a novel gene with a particular phenotype is more difficult than in mouse studies where functional data are more easily obtained and inheritance can be demonstrated rapidly. Not only is this seen with the projects described above but also with other initiatives where mutation detection in NGS data may uncover novel disease-causing variants. For example, modifier screens, where sequencing of ENU mutants is used to discover novel genes that alter a phenotype (Rubio-Aliaga et al. 2007), highlight potential therapeutic targets and generate more complex models of disease. Partnerships between human and mouse geneticists where human-cohort studies run alongside sequencing mouse models with similar phenotypes (Tucci et al. 2014) and mouse GWAS-like studies where multiple mouse lines with varying phenotype severity are sequenced and genotyped to determine regions of linkage disequilibrium or QTLs could therefore be extremely beneficial. Only time will tell if human and mouse sequencing partnerships translate into a clinical setting, in the meantime such studies are continually advancing our understanding of the genetic contribution to disease and physiological processes.
Conclusion
In the present review, we have outlined three disparate methods to detect ENU mutations in NGS data; all methods have been successful in finding an abundance of ENU causative mutations. It is possible a particular method is suited to a specific ENU study, for example, the traditional mutation detection method, method 1 may be employed when investigating a single ENU mouse on a mixed background as gross mapping of the candidate region is relatively easily achieved. Methods 2 and 3 take a population-based type approach with ENU where multiple samples are used to predict ENU mutation. Method 2 is an extension of method 1 and is more effective when the ENU mouse is on an inbred background. Method 3 automatically combines phenotype and genotype information in a GWAS-type fashion to generate linkage region containing the causative gene. As more ENU mutations are characterised the efficient use of CRISPR/Cas 9 genome editing system will become increasingly valuable as a way to validate the ENU mutations. In addition CrispR/Cas 9 can be used to mimic any human deleterious variation. The future of ENU may incorporate the combination of ENU and CRISPR/Cas 9 as this enables both the discovery novel genetic interactions alongside mimicking human disease variants.
References
- Abad-Grau MM, Medina-Medina N, Montes-Soldado R, Matesanz F, Bafna V. Sample reproducibility of genetic association using different multimarker TDTs in genome-wide association studies: characterization and a new approach. PLoS One. 2012;7:e29613. doi: 10.1371/journal.pone.0029613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Acevedo-Arozena A, Wells S, Potter P, Kelly M, Cox RD, Brown SD. ENU mutagenesis, a way forward to understand gene function. Annu Rev Genomics Hum Genet. 2008;9:49–69. doi: 10.1146/annurev.genom.9.081307.164224. [DOI] [PubMed] [Google Scholar]
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andrews TD, Whittle B, Field MA, Balakishnan B, Zhang Y, Shao Y, Cho V, Kirk M, Singh M, Xia Y, et al. Massively parallel sequencing of the mouse exome to accurately identify rare, induced mutations: an immediate source for thousands of new mouse models. Open Biol. 2012;2:120061. doi: 10.1098/rsob.120061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnold CN, Xia Y, Lin P, Ross C, Schwander M, Smart NG, Muller U, Beutler B. Rapid identification of a disease allele in mouse through whole genome sequencing and bulk segregation analysis. Genetics. 2011;187:633–641. doi: 10.1534/genetics.110.124586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnold CN, Barnes MJ, Berger M, Blasius AL, Brandl K, Croker B, Crozat K, Du X, Eidenschenk C, Georgel P, et al. ENU-induced phenovariance in mice: inferences from 587 mutations. BMC Res notes. 2012;5:577. doi: 10.1186/1756-0500-5-577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bainbridge MN, Wang M, Burgess DL, Kovar C, Rodesch MJ, D’Ascenzo M, Kitzman J, Wu YQ, Newsham I, Richmond TA, et al. Whole exome capture in solution with 3 Gbp of data. Genome Biol. 2010;11:R62. doi: 10.1186/gb-2010-11-6-r62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banks G, Heise I, Starbuck B, Osborne T, Wisby L, Potter P, Jackson IJ, Foster RG, Peirson SN, Nolan PM. Genetic background influences age-related decline in visual and nonvisual retinal responses, circadian rhythms, and sleep. Neurobiol Aging. 2015;36:380–393. doi: 10.1016/j.neurobiolaging.2014.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, Melamed R, Rabadan R, Bernstam EV, Brunak S, et al. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell. 2013;155:70–80. doi: 10.1016/j.cell.2013.08.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boles MK, Wilkinson BM, Wilming LG, Liu B, Probst FJ, Harrow J, Grafham D, Hentges KE, Woodward LP, Maxwell A, et al. Discovery of candidate disease genes in ENU-induced mouse mutants by large-scale sequencing, including a splice-site mutation in nucleoredoxin. PLoS Genet. 2009;5:e1000759. doi: 10.1371/journal.pgen.1000759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 2007;35:3823–3835. doi: 10.1093/nar/gkm238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown SDM, Moore MW. Towards an encyclopaedia of mammalian gene function: the International Mouse Phenotyping Consortium. Dis Model Mech. 2012;5:289–292. doi: 10.1242/dmm.009878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buchovecky CM, Turley SD, Brown HM, Kyle SM, McDonald JG, Liu B, Pieper AA, Huang W, Katz DM, Russell DW, et al. A suppressor screen in Mecp2 mutant mice implicates cholesterol metabolism in Rett syndrome. Nat Genet. 2013;45:1013–1020. doi: 10.1038/ng.2714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bull KR, Rimmer AJ, Siggs OM, Miosge LA, Roots CM, Enders A, Bertram EM, Crockford TL, Whittle B, Potter PK, et al. Unlocking the bottleneck in forward genetics using whole-genome sequencing and identity by descent to isolate causative mutations. PLoS Genet. 2013;9:e1003219. doi: 10.1371/journal.pgen.1003219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS One. 2012;7:e46688. doi: 10.1371/journal.pone.0046688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, Bult CJ, Agarwala R, Cherry JL, DiCuccio M, et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009;7:e1000112. doi: 10.1371/journal.pbio.1000112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, et al. The GENCODE exome: sequencing the complete human exome. Eur J Hum Genet. 2011;19:827–831. doi: 10.1038/ejhg.2011.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Baets G, Van Durme J, Reumers J, Maurer-Stroh S, Vanhee P, Dopazo J, Schymkowitz J, Rousseau F. SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Res. 2012;40:D935–D939. doi: 10.1093/nar/gkr996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dees ND, Zhang Q, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Mooney TB, Callaway MB, Dooling D, Mardis ER, et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 2012;22:1589–1598. doi: 10.1101/gr.134635.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doolittle WF. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA. 2013;110:5294–5300. doi: 10.1073/pnas.1221376110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eddy SR. The C-value paradox, junk DNA and ENCODE. Curr Biol. 2012;22:R898–R899. doi: 10.1016/j.cub.2012.10.002. [DOI] [PubMed] [Google Scholar]
- Fairfield H, Gilbert GJ, Barter M, Corrigan RR, Curtain M, Ding Y, D’Ascenzo M, Gerhardt DJ, He C, Huang W, et al. Mutation discovery in mice by whole exome sequencing. Genome Biol. 2011;12:R86. doi: 10.1186/gb-2011-12-9-r86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Genomes Project C. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant JR, Arantes AS, Liao X, Stothard P. In-depth annotation of SNPs arising from sequencing projects using NGS-SNP. Bioinformatics. 2011;27:2300–2301. doi: 10.1093/bioinformatics/btr372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hrabe de Angelis MH, Flaswinkel H, Fuchs H, Rathkolb B, Soewarto D, Marschall S, Heffner S, Pargent W, Wuensch K, Jung M, et al. Genome-wide, large-scale production of mutant mice by ENU mutagenesis. Nat Genet. 2000;25:444–447. doi: 10.1038/78146. [DOI] [PubMed] [Google Scholar]
- Jonczyk MS, Simon M, Kumar S, Fernandes VE, Sylvius N, Mallon AM, Denny P, Andrew PW. Genetic factors regulating lung vasculature and immune cell functions associate with resistance to pneumococcal infection. PLoS One. 2014;9:e89831. doi: 10.1371/journal.pone.0089831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Justice MJ, Carpenter DA, Favor J, Neuhauser-Klaus A, Hrabe de Angelis M, Soewarto D, Moser A, Cordes S, Miller D, Chapman V, et al. Effects of ENU dosage on mouse strains. Mamm Genome. 2000;11:484–488. doi: 10.1007/s003350010094. [DOI] [PubMed] [Google Scholar]
- Khurana E, Fu Y, Chen J, Gerstein M. Interpretation of genomic variants using a unified biological network approach. PLoS Comput Biol. 2013;9:e1002886. doi: 10.1371/journal.pcbi.1002886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kircher M, Kelso J. High-throughput DNA sequencing–concepts and limitations. Bioessays. 2010;32:524–536. doi: 10.1002/bies.200900181. [DOI] [PubMed] [Google Scholar]
- Koscielny G, Yaikhom G, Iyer V, Meehan TF, Morgan H, Atienza-Herrero J, Blake A, Chen CK, Easty R, Di Fenza A, et al. The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. 2014;42:D802–D809. doi: 10.1093/nar/gkt977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1081. doi: 10.1038/nprot.2009.86. [DOI] [PubMed] [Google Scholar]
- Kurapati R, McKenna C, Lindqvist J, Williams D, Simon M, LeProust E, Baker J, Cheeseman M, Carroll N, Denny P, et al. Myofibrillar myopathy caused by a mutation in the motor domain of mouse MyHC IIb. Hum Mol Genet. 2012;21:1706–1724. doi: 10.1093/hmg/ddr605. [DOI] [PubMed] [Google Scholar]
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- Ledergerber C, Dessimoz C. Base-calling for next-generation sequencing platforms. Brief Bioinform. 2011;12:489–497. doi: 10.1093/bib/bbq077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leshchiner I, Alexa K, Kelsey P, Adzhubei I, Austin-Tse CA, Cooney JD, Anderson H, King MJ, Stottmann RW, Garnaas MK, et al. Mutation mapping and identification by whole-genome sequencing. Genome Res. 2012;22:1541–1548. doi: 10.1101/gr.135541.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis SE, Barnett LB, Sadler BM, Shelby MD. ENU mutagenesis in the mouse electrophoretic specific-locus test, 1. Dose-response relationship of electrophoretically-detected mutations arising from mouse spermatogonia treated with ethylnitrosourea. Mutat Res. 1991;249:311–315. doi: 10.1016/0027-5107(91)90005-9. [DOI] [PubMed] [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu XT, Han SZ, Wang ZH, Gelernter J, Yang BZ. Variant callers for next-generation sequencing data: a comparison study. Plos One. 2013;8:e75619. doi: 10.1371/journal.pone.0075619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mardis ER. Next-generation sequencing platforms. Annu Rev Anal Chem (Palo Alto Calif) 2013;6:287–303. doi: 10.1146/annurev-anchem-062012-092628. [DOI] [PubMed] [Google Scholar]
- Masuya H, Sezutsu H, Sakuraba Y, Sagai T, Hosoya M, Kaneda H, Miura I, Kobayashi K, Sumiyama K, Shimizu A, et al. A series of ENU-induced single-base substitutions in a long-range cis-element altering Sonic hedgehog expression in the developing mouse limb bud. Genomics. 2007;89:207–214. doi: 10.1016/j.ygeno.2006.09.005. [DOI] [PubMed] [Google Scholar]
- McLaren W, Pritchard B, Rios D, Chen YA, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–2070. doi: 10.1093/bioinformatics/btq330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mouse Genome Sequencing C. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
- Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nolan PM, Peters J, Strivens M, Rogers D, Hagan J, Spurr N, Gray IC, Vizor L, Brooker D, Whitehill E, et al. A systematic, genome-wide, phenotype-driven mutagenesis programme for gene function studies in the mouse. Nat Genet. 2000;25:440–443. doi: 10.1038/78140. [DOI] [PubMed] [Google Scholar]
- Palazzo AF, Gregory TR. The case for junk DNA. PLoS Genet. 2014;10:e1004351. doi: 10.1371/journal.pgen.1004351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pirooznia M, Kramer M, Parla J, Goes FS, Potash JB, McCombie WR, Zandi PP. Validation and assessment of variant calling pipelines for next-generation sequencing. Hum Genomics. 2014;8:14. doi: 10.1186/1479-7364-8-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Potter P, Wisby L, Blease A, Simon M (2015) Novel gene function revealed by mouse mutagenesis screens for models of age-related disease. Nat Commun Under Review [DOI] [PMC free article] [PubMed]
- Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009;19:1316–1323. doi: 10.1101/gr.080531.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:D756–D763. doi: 10.1093/nar/gkt1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian L, Mahaffey JP, Alcorn HL, Anderson KV. Tissue-specific roles of Axin2 in the inhibition and activation of Wnt signaling in the mouse embryo. Proc Natl Acad Sci USA. 2011;108:8692–8697. doi: 10.1073/pnas.1100328108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom. 2012;13:341. doi: 10.1186/1471-2164-13-341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quwailid MM, Hugill A, Dear N, Vizor L, Wells S, Horner E, Fuller S, Weedon J, McMath H, Woodman P, et al. A gene-driven ENU-based approach to generating an allelic series in any gene. Mamm Genome. 2004;15:585–591. doi: 10.1007/s00335-004-2379-z. [DOI] [PubMed] [Google Scholar]
- Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77:257–286. doi: 10.1109/5.18626. [DOI] [Google Scholar]
- Ren J, Jiang C, Gao X, Liu Z, Yuan Z, Jin C, Wen L, Zhang Z, Xue Y, Yao X. PhosSNP for systematic analysis of genetic polymorphisms that influence protein phosphorylation. Mol Cell Proteomics. 2010;9:623–634. doi: 10.1074/mcp.M900273-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011;39:e118. doi: 10.1093/nar/gkr407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rimmer A, Phan H, Mathieson I, Iqbal Z, Twigg SR, Consortium WGS, Wilkie AO, McVean G, Lunter G. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature genetics. 2014;46:912–918. doi: 10.1038/ng.3036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubio-Aliaga I, Soewarto D, Wagner S, Klaften M, Fuchs H, Kalaydjiev S, Busch DH, Klempt M, Rathkolb B, Wolf E, et al. A genetic screen for modifiers of the delta1-dependent notch signaling function in the mouse. Genetics. 2007;175:1451–1463. doi: 10.1534/genetics.106.067298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russell WL, Hunsicker PR, Carpenter DA, Cornett CV, Guinn GM. Effect of dose fractionation on the ethylnitrosourea induction of specific-locus mutations in mouse spermatogonia. Proc Natl Acad Sci USA. 1982;79:3592–3593. doi: 10.1073/pnas.79.11.3592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11:361–362. doi: 10.1038/nmeth.2890. [DOI] [PubMed] [Google Scholar]
- Simon MM, Mallon AM, Howell GR, Reinholdt LG. High throughput sequencing approaches to mutation discovery in the mouse. Mamm Genome. 2012;23:499–513. doi: 10.1007/s00335-012-9424-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon MM, Greenaway S, White JK, Fuchs H, Gailus-Durner V, Wells S, Sorg T, Wong K, Bedu E, Cartwright EJ, et al. A comparative phenotypic and genomic analysis of C57BL/6J and C57BL/6N mouse strains. Genome Biol. 2013;14:R82. doi: 10.1186/gb-2013-14-7-r82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014;15:121–132. doi: 10.1038/nrg3642. [DOI] [PubMed] [Google Scholar]
- Stamatoyannopoulos JA. What does our genome encode? Genome Res. 2012;22:1602–1611. doi: 10.1101/gr.146506.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun M, Mondal K, Patel V, Horner VL, Long AB, Cutler DJ, Caspary T, Zwick ME. Multiplex chromosomal exome sequencing accelerates identification of ENU-induced mutations in the mouse. G3. 2012;2:143–150. doi: 10.1534/g3.111.001669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tucci V, Kleefstra T, Hardy A, Heise I, Maggi S, Willemsen MH, Hilton H, Esapa C, Simon M, Buenavista MT, et al. Dominant beta-catenin mutations cause intellectual disability with recognizable syndromic features. J Clin Invest. 2014;124:1468–1482. doi: 10.1172/JCI70372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinuesa CG, Goodnow CC. Illuminating autoimmune regulators through controlled variation of the mouse genome sequence. Immunity. 2004;20:669–679. doi: 10.1016/j.immuni.2004.05.012. [DOI] [PubMed] [Google Scholar]
- Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009;55:641–658. doi: 10.1373/clinchem.2008.112789. [DOI] [PubMed] [Google Scholar]
- Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang T, Zhan X, Bu CH, Lyon S, Pratt D, Hildebrand S, Choi JH, Zhang Z, Zeng M, Wang KW, et al. Real-time resolution of point mutations that cause phenovariance in mice. Proc Natl Acad Sci USA. 2015;112:E440–E449. doi: 10.1073/pnas.1423216112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wendl MC, Wallis JW, Lin L, Kandoth C, Mardis ER, Wilson RK, Ding L. PathScan: a tool for discerning mutational significance in groups of putative cancer genes. Bioinformatics. 2011;27:1595–1602. doi: 10.1093/bioinformatics/btr193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Worth CL, Preissner R, Blundell TL. SDM–a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Res. 2011;39:W215–W222. doi: 10.1093/nar/gkr363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yue P, Melamud E, Moult J. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics. 2006;7:166. doi: 10.1186/1471-2105-7-166. [DOI] [PMC free article] [PubMed] [Google Scholar]