Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2012 Nov 1.

Published in final edited form as: Nat Methods. 2012 Apr 27;9(5):459–462. doi: 10.1038/nmeth.1974

Data Flow in the 1000 Genomes Project. The sequencing centers submit their raw data to one of the two SRA databases (arrow 1), which exchange data. The DCC retrieves FASTQ files from the SRA (arrow 2) and performs QC steps on the data. The analysis group access data from the DCC (arrow 3), aligns the sequence data to the genome and uses the alignments to call variants. Both the alignment files and variant files are provided back to the DCC (arrow 4). All the data is publically released as soon as possible. Sequencing center names are provided in supplementary table 1.