Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 19.
Published in final edited form as: Methods Mol Biol. 2022;2468:257–269. doi: 10.1007/978-1-0716-2181-3_13

Mutation Mapping and Identification by Whole-Genome Sequencing

Harold E Smith
PMCID: PMC9389619  NIHMSID: NIHMS1828846  PMID: 35320569

Abstract

Geneticists approach biology with a simple question: which genes are required for the pathway or process of interest? Classical genetic screens (aka forward genetics) in model organisms such as Caenorhabditis elegans have been the method of choice for answering that question. Next-generation sequencing provides the means to generate a comprehensive list of sequence variants, including the mutation of interest. Herein is described a workflow for sample preparation and data analysis to allow the simultaneous mapping and identification of candidate mutations by whole-genome sequencing in Caenorhabditis elegans.

Keywords: Sequence variant, Polymorphism mapping, NGS library construction, Whole-genome sequencing, Mutation analysis

1. Introduction

For more than a century, researchers have used classical genetics (random mutagenesis and phenotypic screening) in model species to identify the genes involved in biological pathways [1, 2]. The availability of both self and outcross modes of reproduction makes Caenorhabditis elegans ideally suited for such genetic screens. Historically, the ability to generate interesting mutations in worms has greatly exceeded the ability to identify causative sequence variants: dozens or even hundreds of alleles might be recovered in a single mutant screen (e.g., [3, 4]). However, the application of next-generation sequencing technology (NGS) to mutation identification in C. elegans has largely eliminated this bottleneck [57]. Whole-genome sequencing provides a catalog of all sequence variants within the strain of interest. By crossing the strain into a highly polymorphic genetic background, both positional information and novel mutations can be obtained [8, 9]. Candidate genes identified in this manner can be validated easily by secondary screening such as RNAi phenocopy [10] or CRISPR-mediated genome engineering [11].

The workflow for sample preparation for whole-genome sequencing consists of the following steps: (1) crossing the mutation-bearing strain to a polymorphic strain and picking homozygous F2 progeny; (2) isolating and shearing the genomic DNA (gDNA); and (3) constructing a sequencing library from the sheared gDNA sample. The indicated protocol produces sequence-ready libraries and is designed to accommodate phenotypes (e.g., nonconditional lethality or sterility) that are typically difficult to analyze. The only limitation is that the allele of interest be recessive, so that homozygous and heterozygous segregants can be distinguished. Sequencing per se is not described, as this service is typically performed by a core facility.

Each step in this workflow is subject to considerable variation. The protocol is designed for monogenic recessive alleles, but can be adapted for more complex variants [12]. Hawaiian strain CB4856 is the polymorphic strain used for the mapping cross, but other strains can be employed [e.g., 13]. The picking of F2 progeny (step 1) will be determined by the mutant phenotype in question. The gDNA isolation (step 2) is intended for small numbers of worms, but can be used for bulk samples as well. The library construction protocol (step 3) is specific to short-read (~102 bases) sequencing on an Illumina instrument (the most common and cost-effective platform for this application); the use of other sequencers will necessitate platform-specific sample preparation. In all cases, the reader is referred to the Notes section for relevant parameters to consider when using alternative methods.

The workflow for data analysis consists of the following steps: filtering to remove low-quality and contaminating data; alignment to the reference genome; post-processing of the aligned data; variant calling; extraction of Hawaiian SNPs for mapping; and annotation of novel variants to identify candidate mutations. The software pipeline described here performs well for C. elegans mutation mapping [14], but requires a computer running the Linux operating system plus some familiarity with command-line instructions. For users who prefer a web-based environment, the public Galaxy platform [15] provides a variety of software tools for building a custom workflow. It also offers MiModD [16], a beginning- to-end software pipeline for variant detection.

2. Materials

2.1. Worm Growth and Mating

  1. Dissecting microscope.

  2. Worm pick (see Note 1).

  3. NGM plates: In 2 L flask, mix 3 g NaCl, 17 g agar, 2.5 g peptone, and 975 mL H2O. Autoclave to sterilize. Cool to 55 °C. Add 1 mL cholesterol (5 mg/mL in ethanol; do not autoclave), 1 mL 1 M CaCl2, 1 mL 1 M MgSO4, and 25 mL 1 M KPO4, pH 6.0 (108.3 g KH2PO4, 35.6 g K2HPO4. H2O, 1 L H2O) (see Note 2). Dispense 14 mL into 6 cm petri plates. Cool plates to room temperature. Seed plates with ~50 μL OP50 culture (grown overnight in LB medium at 37 °C). Incubate seeded plates overnight at room temperature (see Note 3).

  4. E. coli strain OP50 (see Note 4).

  5. C. elegans strain CB4856 (see Note 4).

2.2. gDNA Isolation

  1. M9 buffer: Mix 3 g KH2PO4, 6 g Na2HPO4, 5 g NaCl, and 1 mL 1 M MgSO4 in 1 L H2O; autoclave to sterilize.

  2. TE buffer: Mix 10 mL 1 M Tris–HCl, pH 8.0 and 2 mL 0.5 M EDTA, pH 8.0 in 988 mL H2O; autoclave to sterilize.

  3. Worm lysis buffer: Mix 100 μL 1 M Tris–HCl (pH 8.0), 20 μL 5 M NaCl, 100 μL 0.5 M EDTA, 125 μL 10% (v/v) SDS, and 655 μL H2O.

  4. Proteinase K, 10 mg/mL concentration.

  5. RNase A, 10 mg/mL concentration.

  6. Sonicating water bath.

  7. Low elution volume PCR purification kit.

2.3. Library Construction

  1. DNA library prep kit for NGS.

  2. Multiplex oligos for NGS.

  3. EB: 10 mM Tris–HCl, pH 8.0 or 8.5.

  4. Paramagnetic beads for high-throughput purification.

  5. Magnetic tube stand.

  6. 80% (v/v) ethanol; prepare fresh immediately before use.

2.4. Data Analysis

  1. Computer workstation with (minimum) eight cores, 16 Gb RAM, 1 Tb hard drive storage.

  2. Linux-type operating system (popular options include CentOS, Ubuntu, or the commercial distribution Redhat Enterprise Linux).

3. Methods

3.1. Mating and F2 Selection (See Note 5)

  1. Using a dissecting microscope and worm pick, transfer two to three young adult hermaphrodites containing the mutation of interest to a NGM plate seeded with OP50. Pick 10–12 adult males of strain CB4856 to the same plate. Allow to mate overnight at room temperature (see Note 6).

  2. Transfer each mated hermaphrodite to an individual fresh-seeded NGM plate. Incubate at room temperature until the F1 progeny begin to reach the L4/young adult stages (~4 days) (see Note 7).

  3. Pick 10–12 F1 L4/young adult hermaphrodites to a fresh-seeded NGM plate. Incubate at room temperature until F2 progeny can be scored for the mutant phenotype (see Note 8).

3.2. gDNA Isolation

  1. Pick homozygous F2 progeny into a 1.5 mL centrifuge tube containing 500 μL M9 (see Note 9).

  2. Vortex briefly (3–5 s), then spin 60 s at 1300 RCF.

  3. Remove most of the M9 by pipette, taking care to avoid the pellet.

  4. Resuspend in 500 μL M9 and repeat the wash at least four times (see Note 10).

  5. Perform a final wash with 500 μL TE; spin 1 min at top speed; remove TE, leaving ~100 μL (see Note 11).

  6. Add 400 μL worm lysis buffer to the worm sample; mix briefly.

  7. Sonicate with the BioRuptor using the following settings: high power; 30 s on/30 s off; 2 × 15 min sonication time (see Note 12).

  8. Add 50 μL proteinase K; mix well; incubate 1 h at 65 °C, vortexing briefly at 10–15 min intervals to maintain suspension (see Note 13).

  9. Add 20 μL RNase A; incubate 30 min at 37 °C.

  10. Purify sheared gDNA using a low elution volume purification column per the manufacturer’s protocol; the final elution volume is 10 μL (see Notes 14 and 15).

3.3. Library Construction

3.3.1. End Prep Reaction

  1. Mix in 0.5 mL tube (see Note 16): 10 μL sheared gDNA input, 40 μL EB or water, 3 μL 10× End Prep Enzyme Mix, 7 μL End Prep Reaction Buffer.

  2. Incubate in a thermal cycler: 30 min at 20 °C; 30 min at 65 °C; Hold at 4 °C.

3.3.2. Adapter Ligation Reaction

  1. Dilute the adapter immediately before use (see Note 17).

  2. Add the following to the End Prep reaction and mix (see Note 16): 30 μL Ligation Master Mix, 1 μL Ligation Enhancer, 2.5 μL diluted adapter.

  3. Incubate in a thermal cycler (the heated lid should be off): 15 min at 20 °C.

  4. Add 3 μL USER enzyme and mix well.

  5. Incubate in a thermal cycler (the heated lid should be on): 15 min at 37 °C; Hold at 4 °C (see Note 18).

3.3.3. Paramagnetic-Bead Clean-up (See Note 19)

  1. Remove the reaction tube from the thermal cycler; add 87 μL beads; mix well by vortexing (see Note 20).

  2. Incubate the reaction tube 5 min on the benchtop.

  3. Place the reaction tube in the magnetic stand; incubate 5 min; remove and discard the supernatant (see Note 21).

  4. Leave the reaction tube in the magnetic stand; add 200 μL freshly prepared 80% (v/v) ethanol; incubate 30 s; carefully remove and discard the supernatant.

  5. Repeat step 4 once, for a total of two washes. If necessary, spin briefly, return to the magnetic stand, and remove residual ethanol with a 10 μL pipet.

  6. Air-dry the beads in the open tube for 5 min on the magnetic stand.

  7. Add 17 μL EB to elute the DNA; pipette or vortex to resuspend the bead pellet; incubate for 2 min on the benchtop; return to the magnetic stand for 5 min; transfer 15 μL of supernatant (containing the DNA) to a new 0.5 mL tube (see Note 18).

3.3.4. PCR Amplification

  1. Add the following to the DNA sample and mix: 25 μL PCR Master Mix, 5 μL index primer, 5 μL universal PCR primer.

  2. Incubate in a thermal cycler. One cycle of: 30 s at 98 °C; Multiple cycles (see Note 22) of: 10 s at 98 °C, 75 s at 65 °C; One cycle of: 5 min at 65 °C; Hold at 4 °C.

3.3.5. Final Paramagnetic Bead Clean-up (See Note 23)

  1. Follow the protocol above (Subheading 3.3.3) with the following differences: In step 1, add 45 μL beads; in step 7, add 35 μL EB; transfer 30 μL to a new tube. This purified library can be stored indefinitely at −20 °C (see Note 24).

3.4. Data Analysis

3.4.1. Installation and Configuration (See Note 25)

  1. Download and install the following software tools from the indicated links (See Notes 26 and 27): BBMap: https://sourceforge.net/projects/bbmap/; SAMtools and BCFtools: http://www.htslib.org/download/; FreeBayes: https://github.com/ekg/freebayes; BEDtools: https://bedtools.readthedocs.io/en/latest/; ANNOVAR: https://annovar.openbioinformatics.org/en/latest/user-guide/download/; gtfToGenePred: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

  2. Download copies of C. elegans datasets at https://www.wormbase.org. Select “Downloads,” then “Public FTP Site,” connect as “Guest” if prompted, and follow the directory path “releases/WS278/species/c_elegans/PRJNA13758”.

    For the reference genome (see Note 28) in FASTA format, select: “c_elegans.PRJNA13758.WS278.genomic.fa.gz”.

    For the gene list in GTF format, select: “c_elegans.PRJNA13758.WS278.canonical_geneset.gtf.gz”.

    For the list of common gene names, select: “annotation/c_elegans.PRJNA13758.WS278.geneIDs.txt.gz”.

  3. Download a copy of the E. coli reference genome (used to remove contaminating sequences derived from the OP50 food source) at https://bacteria.ensembl.org/Escherichia_coli_str_k_12_substr_mg1655/Info/Index. Select “Download DNA Sequence (FASTA),” connect as “Guest” if prompted, and then select “Escherichia_coli_str_k_12_-substr_mg1655.ASM584v2.dna.chromosome.Chromosome. fa.gz”.

  4. Download a copy of the Hawaiian SNPs in VCF format at https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0174446.s009&type=supplementary

  5. Create a directory for the downloaded reference files, then extract and move/rename them (see Note 29).
    $ mkdir /PATH/TO/reference_genomes
    $ mkdir /PATH/TO/reference_genomes/WS278_ref
    $ gunzip c_elegans.PRJNA13758.WS278.genomic.fa.gz
    $ mv c_elegans.PRJNA13758.WS278.genomic.fa \ /PATH/TO/reference_genomes/WS278_ref/Celegans.fa
    $ gunzip c_elegans.PRJNA13758.WS278.canonical_geneset.gtf.gz
    $ mv c_elegans.PRJNA13758.WS278.canonical_geneset.gtf \ /PATH/TO/reference_genomes/WS278_ref/Celegans_geneset.gtf
    $ gunzip c_elegans.PRJNA13758.WS278.geneIDs.txt.gz
    $ mv c_elegans.PRJNA13758.WS278.geneIDs.txt \ /PATH/TO/reference_genomes/WS278_ref/Celegans_geneIDs.txt
    $ gunzip Escherichia_coli_str_k_12_substr_mg1655.
    ASM584v2.dna.chromosome.Chromosome.fa.gz
    $ mv Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.
    dna.chromosome.Chromosome.fa \ /PATH/TO/reference_genomes/Ecoli.fa
    $ mv journal.pone.0174446.s009.txt /PATH/TO/reference_-genomes/Hawaiian_snps.vcf
    
  6. Index the C. elegans reference genome for use with SAMtools.
    $ cd /PATH/TO/reference_genomes/WS278_ref
    $ /PATH/TO/SAMTOOLS/samtools faidx Celegans.fa
    
  7. Convert/generate supplementary files for ANNOVAR.
    $ /PATH/TO/gtfToGenePred -genePredExt Celegans_geneset.gtf WS278_refGene.txt
    $ perl /PATH/TO/ANNOVAR/retrieve_seq_from_fasta.pl --format refGene \
    --seqfile Celegans.fa WS278_refGene.txt --out WS278_ref-GeneMrna.fa
    $ grep ‘protein_coding_gene’ Celegans_geneIDs.txt | sed ‘s/,,/,/g’ \
    | awk -F “,” ‘BEGIN{OFS=“\t”;} {print $2,$3}’ > WS278. gene_xref.txt
    

3.4.2. Pre-processing of Sequence Data

  1. Download the sequence data file (see Note 30), create a directory named “data,” extract the sequence data if compressed, move it to the “data” directory, and change your working directory to “data”.
    $ mkdir /PATH/TO/data
    $ gunzip SEQUENCE_DATA.FQ.GZ (optional: only if file is compressed)
    $ mv SEQUENCE_DATA.FQ /PATH/TO/data/
    $ cd /PATH/TO/data/
    
  2. Remove contaminating E. coli sequences (see Note 31).
    $ /PATH/TO/BBMAP/bbduk.sh in=SEQUENCE_DATA.FQ out=FILTERED_DATA.FQ \
    ref=/PATH/TO/reference_genomes/Ecoli.fa k=31 hdist=1
    
  3. Remove contaminating adapter sequences (see Note 32).
    $ /PATH/TO/BBMAP/bbduk.sh in=FILTERED_DATA.FQ out=FILTERED_2_DATA.FQ \ ref=/PATH/TO/ADAPTERS.FA ktrim=r k=23 hdist=1 mink=11 minlen=25 tpe tbo
    
  4. Remove low-quality sequences.
    $ /PATH/TO/BBMAP/bbduk.sh in=FILTERED_2_DATA.FQ out=TRIM-MED_DATA.FQ \ qtrim=rl trimq=10 minlen=25
    

3.4.3. Alignment of Pre-processed Data to the C. elegans Reference Genome

$ /PATH/TO/BBMAP/bbmap.sh ref=/PATH/TO/reference_genomes/WS278_ref/Celegans.fa \
sam=1.3 ambiguous=toss in=TRIMMED_DATA.FQ out=DATA.BAM

3.4.4. Post-processing of the Aligned Data for Variant Calling (Sort, Remove Duplicates, and Index)

$ /PATH/TO/SAMTOOLS/samtools sort -O bam -o SORTED_DATA. BAM -T TEMP \
-@ 8 DATA.BAM
$ /PATH/TO/SAMTOOLS/samtools rmdup -s SORTED_DATA.BAM DEDUP_DATA.BAM
$ /PATH/TO/SAMTOOLS/samtools index DEDUP_DATA.BAM

3.4.5. Hawaiian SNP Calling for Mapping (See Note 33)

$ /PATH/TO/FREEBAYES/freebayes -f > /PATH/TO/reference_-genomes/WS278_ref/Celegans.fa \
-F 0.01 -C 1 --pooled-continuous DEDUP_DATA.BAM > MAPPING_DATA.VCF
$ bedtools intersect -a MAPPING_DATA.VCF -b /PATH/TO/reference_genomes/Hawaiian_snps.vcf > \ SEQUENCE_DATA_HAW.VCF
$ /PATH/TO/BCFTOOLS/bcftools query -f ‘%CHROM %POS %AB\n’ \ SEQUENCE_DATA_HAW.VCF > HAW_FREQUENCY.TXT

3.4.6. Variant Calling for Candidate Mutations (See Note 34)

$ /PATH/TO/FREEBAYES/freebayes -f /PATH/TO/reference_genomes/WS278_ref/Celegans.fa \ DEDUP_DATA.BAM > VARIANTS.VCF
$ /PATH/TO/FREEBAYES/vcflib/vcffilter ‘f “DP > 2” -g “GT = 1/1” VARIANTS.VCF > CANDIDATES.VCF

3.4.7. Annotation of Candidate Mutations (See Note 35)

$ perl /PATH/TO/ANNOVAR/table_annovar.pl FILE.VCF /PATH/TO/reference_genomes/WS278_ref/ \
--buildver WS278 --out myanno --polish --remove --protocol refGene --operation g \
--vcfinput --xref WS278_ref/WS278.gene_xref.txt

4. Notes

  1. Instructions for making worm picks are available online and picks are also available commercially. (e.g., http://openwetware.org/wiki/BISC_219/F10:_Gene_Linkage#Making_a_Worm_Pick or http://www.wormbook.org/wbg/articles/volume-19-number-1/a-better-worm-pick-handle).

  2. NGM medium lacks antibiotics, so sterile technique is essential to prevent contamination. The cholesterol solution is flammable and does not require sterilization. The remaining solutions can be sterilized by autoclave or filtration. Airborne contaminants can be avoided by working in a laminar flow hood.

  3. Plates can be stored seeded or unseeded in airtight containers at 4 °C for several weeks.

  4. Strains are available from the Caenorhabditis Genetics Center (http://www.cgc.cbs.umn.edu).

  5. For strains that do not require mapping, begin with gDNA isolation.

  6. The mutation-bearing strain may be homozygous (preferable) or heterozygous (for phenotypes that preclude mating, such as nonconditional sterility or lethality). Conditional alleles that require maintenance at 15 °C should be allowed to mate for 24 h. Mutations that impair mating efficiency may require more hermaphrodites to ensure success.

  7. Successful mating produces equal numbers of male and hermaphrodite outcross progeny, which are first distinguishable at the L4 stage. Use only those plates with successful mating, especially when the starting strain is heterozygous.

  8. For heterozygous starting strains, only half of the F1 progeny will contain the mutation; therefore, the number of picked F1s should be doubled. Temperature-sensitive alleles should be incubated at the nonconditional temperature to allow discrimination of homozygous F2 mutants.

  9. This is a very important note: Mapping and variant calling are dependent on homozygosity at the mutant locus. The number of animals to pick is determined by the stage of development: 200 mid-stage embryos or 50 adult hermaphrodites yield an adequate amount of gDNA for library prep. The F2 animals contain the recombinant chromosomes used for polymorphism mapping; 50 animals is the recommended minimum for optimal mapping resolution. For bulk samples, wash with ≥10× volumes of TE per volume of packed worms.

  10. Washing is critical to remove as much of the OP50 bacteria as possible and minimize the amount of contaminating DNA in the sequencing library. Viable animals contain OP50 in the intestine. To remove, incubate the animals in M9 for 30 min with gentle shaking followed by multiple washes.

  11. Samples can be frozen at −80 °C at this point if desired. To continue, thaw in ice bath and proceed with step 6.

  12. Sonication disrupts the sample and shears the genomic DNA. If using a different instrument, parameters should be optimized to produce sheared gDNA in the 200–300 bp size range. Care should be taken to minimize sample heating and foaming.

  13. The suspension should be clear by the end of incubation, indicating complete digestion.

  14. Column purification kits are available from multiple vendors. Be sure that the protocol is compatible with samples containing ~1% SDS.

  15. The amount of sheared gDNA should be 2–10 ng, which can be quantified by dye fluorometry or qPCR. Remember to increase the elution volume accordingly, leaving a final volume of 10 μL after quantification.

  16. Some components of the library prep kit are viscous and require thorough mixing. After combining the reaction components, set a pipet (100 μL or 200 μL) to ~80% of the total reaction volume and pipet up and down multiple times. If necessary, spin the tube briefly before incubation to collect the liquid at the bottom of the tube. The presence of small bubbles will not adversely affect the reaction.

  17. The proper dilution ratio is determined by the amount of sheared gDNA input. For ≥5 ng, dilute 1:10 (= 1.5 μM concentration); for <5 ng, dilute 1:25 (= 0.6 μM concentration).

  18. The samples can be stored overnight at −20 °C.

  19. Resuspend beads by vortexing immediately before pipetting; perform all incubations at room temperature.

  20. Depending upon your magnetic stand, it may be necessary to transfer the sample to a tube of different size (e.g., 1.5 mL).

  21. The solution will clear as beads adhere to the side of the tube adjacent to the magnet. When removing the supernatant, pipette slowly and carefully to avoid the bead pellet and bound gDNA.

  22. For ≥5 ng of input gDNA, use eight cycles of amplification; for <5 ng of input gDNA, use 10 cycles of amplification.

  23. Alternatively, a low elution volume purification column can be used for the final clean-up.

  24. The library is now ready for quantification, using a method suitable for DNA concentrations in the low ng/μL range (e.g., dye fluorometry or qPCR). UV absorbance is relatively insensitive and therefore not recommended. Library quality can be assessed using the Agilent Bioanalyzer with a high-sensitivity DNA chip to detect the size distribution. If adapter dimer contamination (a discrete band of ~120 bp) is observed, it can be removed by size fractionation via gel isolation or AMPure bead size selection. Libraries constructed with different index primers can be pooled (aka multiplexed). The recommended minimum sequencing depth for mutation identification is 20-fold genome coverage, or 2 × 109 bp. The amount of data one obtains is determined by the sequencing capacity of the instrument and the degree of multiplexing; consult your sequencing service provider for guidance.

  25. These steps are performed only once prior to running the analysis pipeline.

  26. Follow the developers’ instructions for the installation of software, including dependencies.

  27. The default directory for downloads is typically located in your home directory (/home/USERNAME/Downloads). Directories and files can be relocated to a different directory via the “mv” (move) command and specifying the path to the new location. A common convention is to create a directory named “Sequencing” in your home directory ($ mkdir /home/USERNAME/Sequencing) and then creating subdirectories (e.g., software, reference_genomes, data) there.

  28. This is a very important note: Use the identical version of the reference genome for the entire workflow, including any downstream applications. For example, if you intend to use the UCSC Genome Browser for visualization, then download the reference genome from that source. The current WormBase version at the time of publication was WS278; different versions can be obtained by changing the number after “WS” in the link.

  29. Command-line instructions use the following conventions: (a) Each command is preceded by a dollar sign ($); (b) User-specific variables are shown in UPPER_CASE_ITALICS; (c) Single-line commands that are too long to fit the page are split by backslash (\), do not include the backslash in the command; (d) These instructions provide explicit paths to the software for operation; alternatively, the software directory can be added to the $PATH variable.

  30. Your sequencing service provider will provide instructions for downloading your data. The data should be provided in unaligned FASTQ format (suffix “.fq” or “.fastq”). It may also be compressed (typically in gzip format; suffix “.gz”).

  31. The commands in this example assume single-end sequencing (typically annotated as “R1”); paired-end data (typically annotated as “R1” and “R2”) will require modification of the commands. Command usage and flag descriptions can be found at https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/.

  32. Common adapter sequences are supplied with BBMap in the “resources” directory.

  33. The final file HAW_FREQUENCY .TXT contains chromosome, position, and Hawaiian SNP frequency information. The data for each chromosome can be plotted using Excel or R to determine the mapping interval (a gap in the Hawaiian SNP plot).

  34. Candidate mutations are defined as homozygous variants (GT = 1/1) with a minimum coverage of three reads (DP > 2) to avoid spurious variant calls arising from low coverage.

  35. The output file contains all categories of sequence variants. The overwhelming majority of causative mutations alter the coding potential of genes, so prioritize non-synonymous and splicing variants for subsequent validation.

Acknowledgments

I would like to thank Sevinc Ercan for sharing the small-scale gDNA isolation protocol. This work was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, and is subject to the NIH Public Access Policy.

References

  • 1.Morgan TH (1911a) The origin of nine wing mutations in Drosophila. Science 33:496–499 [DOI] [PubMed] [Google Scholar]
  • 2.Morgan TH (1911b) The origin of five mutations in eye color in Drosophila and their modes of inheritance. Science 33:534–537 [DOI] [PubMed] [Google Scholar]
  • 3.Brenner S (1974) The genetics of Caenorhabditis elegans. Genetics 77:71–94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hirsh D, Vanderslice R (1976) Temperature-sensitive developmental mutants of Caenorhabditis elegans. Dev Biol 49:220–235 [DOI] [PubMed] [Google Scholar]
  • 5.Sarin S, Prabhu S, O’Meara MM et al. (2008) Caenorhabditis elegans mutant allele identification by whole-genome sequencing. Nat Methods 5:865–867 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sarin S, Bertrand V, Bigelow H et al. (2010) Analysis of multiple ethyl methanesulfonate-mutagenized Caenorhabditis elegans strains by whole-genome sequencing. Genetics 185: 417–430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zuryn S, Le Gras S, Jamet K et al. (2010) A strategy for direct mapping and identification of mutations by whole-genome sequencing. Genetics 186:427–430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wicks SR, Yeh RT, Gish WR et al. (2001) Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map. Nat Genet 28:160–164 [DOI] [PubMed] [Google Scholar]
  • 9.Doitsidou M, Poole RJ, Sarin S et al. (2010) C. elegans mutant identification with a one-step whole-genome-sequencing and SNP mapping strategy. PLoS One 5:e15435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fire A, Xu S, Montgomery MK et al. (1998) Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391:806–811 [DOI] [PubMed] [Google Scholar]
  • 11.Friedland AE, Tzur YB, Esvelt KM et al. (2013) Heritable genome editing in C. elegans via a CRISPR-Cas9 system. Nat Methods 10: 741–743 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Smith HE, Fabritius AS, Jaramillo-Lambert A et al. (2016) Mapping challenging mutations by whole-genome sequencing. G3 (Bethesda) 6: 1297–1304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mok CA, Au V, Thompson OA et al. (2017) MIP-MAP: high-throughput mapping of Caenorhabditis elegans temperature-sensitive mutants via molecular inversion probes. Genetics 9:3477–3488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Smith HE, Yun S (2017) Evaluating alignment and variant-calling software for mutation identification in C. elegans by whole-genome sequencing. PLoS One 12:e0174446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Afgan E, Baker D, Batut B et al. (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucl Acids Res 46:W537–W544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Maier W, Moos K, Seifert M et al. (2014) MiModD - mutation identification in model organism genomes. https://sourceforge.net/projects/mimodd/

RESOURCES