| Some important points to consider | |
| • Availability of appropriate computational resources | |
| • Collaboration with sequencing facility and bioinformatics groups | |
| • Plan for amount and type of sequencing data needed | |
| • Does funding allow to produce sufficient sequence coverage? If not, alternative approaches should be considered rather than producing a poor, low coverage, assembly | |
| • Familiarization with data handling pipelines and file formats (see below) | |
| • High-quality DNA sample (with individual metadata) | |
| • Plan for analyses and publication | |
| Some useful resources | |
| Internet forums for discussions related to genome sequencing | |
| • http://seqanswers.com/ | |
| • http://www.biostars.org/ | |
| • http://www.biosupport.se/ | |
| Entry points to genome sequencing, assembly and exemplary downstream analyses | |
| • Library preparation and Sequencing: Mardis (2008, 2013) | |
| • Quality filtering/preprocessing: Patel and Jain (2012), Zhou and Rokas (2014), Smeds and Künstner (2011) | |
| • Genome assembly: Nagarajan and Pop (2013), Pop (2009), Flicek and Birney (2009) | |
| • Assembly evaluation: Earl et al. (2011), Bradnam et al. (2013), Bao et al. (2011) | |
| • Genome annotation: Yandell and Ence (2012) | |
| • Mapping: Li and Durbin (2009), Trapnell and Salzberg (2009), Bao et al. (2011) | |
| • Data handling: Li et al. (2009), Quinlan and Hall (2010) | |
| • Variant calling: Nielsen et al. (2011), DePristo et al. (2011), Van der Auwera et al. (2013) | |
| • Haplotype-based approaches: Browning and Browning (2011), Tewhey et al. (2011), Lawson et al. (2012) | |
| • Population genomic summary statistics: Nielsen et al. (2012b), Danecek et al. (2011) | |
| Web resources | |
| • Galaxy (http://galaxyproject.org/) | |
| • Amazon cloud (http://aws.amazon.com/ec2/) | |
| • Windows Azure (http://www.windowsazure.com/) | |
| • Magellan: Cloud Computing for Science (http://www.alcf.anl.gov/magellan) | |
| • Web Apollo (http://genomearchitect.org/) | |
| • NCBI BioProject (http://www.ncbi.nlm.nih.gov/bioproject/) | |
| • Genomes OnLine Database (http://genomesonline.org/cgi-bin/GOLD/index.cgi) | |
| • ENSEMBL genome database (http://www.ensembl.org/index.html) | |
| • UCSC Genome Browser (http://genomebrowser.wustl.edu/) | |
| • fastQCtoolkit for data preprocessing (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) | |
| Genome size databases | |
| • Plants: http://data.kew.org/cvalues/ | |
| • Animals: http://www.genomesize.com/ | |
| Common file formats | |
| • FASTA | Nucleotide sequence (file extension .fas or .fa) |
| • FASTQ | Nucleotide sequence including quality scores |
| • SAM | Sequence alignment |
| • BAM | Binary version of SAM |
| • GFF3 | Annotation |
| • GTF | Annotation |
| • BED | Annotation |
| • VCF | Variant calling |