Skip to main content
. 2014 Jun 24;7(9):1026–1042. doi: 10.1111/eva.12178
Some important points to consider
• Availability of appropriate computational resources
• Collaboration with sequencing facility and bioinformatics groups
• Plan for amount and type of sequencing data needed
• Does funding allow to produce sufficient sequence coverage? If not, alternative approaches should be considered rather than producing a poor, low coverage, assembly
• Familiarization with data handling pipelines and file formats (see below)
• High-quality DNA sample (with individual metadata)
• Plan for analyses and publication
Some useful resources
Internet forums for discussions related to genome sequencing
http://seqanswers.com/
http://www.biostars.org/
http://www.biosupport.se/
Entry points to genome sequencing, assembly and exemplary downstream analyses
• Library preparation and Sequencing: Mardis (2008, 2013)
• Quality filtering/preprocessing: Patel and Jain (2012), Zhou and Rokas (2014), Smeds and Künstner (2011)
• Genome assembly: Nagarajan and Pop (2013), Pop (2009), Flicek and Birney (2009)
• Assembly evaluation: Earl et al. (2011), Bradnam et al. (2013), Bao et al. (2011)
• Genome annotation: Yandell and Ence (2012)
• Mapping: Li and Durbin (2009), Trapnell and Salzberg (2009), Bao et al. (2011)
• Data handling: Li et al. (2009), Quinlan and Hall (2010)
• Variant calling: Nielsen et al. (2011), DePristo et al. (2011), Van der Auwera et al. (2013)
• Haplotype-based approaches: Browning and Browning (2011), Tewhey et al. (2011), Lawson et al. (2012)
• Population genomic summary statistics: Nielsen et al. (2012b), Danecek et al. (2011)
Web resources
• Galaxy (http://galaxyproject.org/)
• Amazon cloud (http://aws.amazon.com/ec2/)
• Windows Azure (http://www.windowsazure.com/)
• Magellan: Cloud Computing for Science (http://www.alcf.anl.gov/magellan)
• Web Apollo (http://genomearchitect.org/)
• NCBI BioProject (http://www.ncbi.nlm.nih.gov/bioproject/)
• Genomes OnLine Database (http://genomesonline.org/cgi-bin/GOLD/index.cgi)
• ENSEMBL genome database (http://www.ensembl.org/index.html)
• UCSC Genome Browser (http://genomebrowser.wustl.edu/)
• fastQCtoolkit for data preprocessing (http://www.bioinformatics.babraham.ac.uk/projects/fastqc)
Genome size databases
• Plants: http://data.kew.org/cvalues/
• Animals: http://www.genomesize.com/
Common file formats
• FASTA Nucleotide sequence (file extension .fas or .fa)
• FASTQ Nucleotide sequence including quality scores
• SAM Sequence alignment
• BAM Binary version of SAM
• GFF3 Annotation
• GTF Annotation
• BED Annotation
• VCF Variant calling