Figure 3.
Flowchart of the whole imputation process (see the EPIGEN-Brazil Scientific Workflow: http://www.ldgh.com.br/scientificworkflow/flowcharts.php). (A) Overview of the complete imputation process. (B,C) Two previous tasks may be required for imputation if it is necessary to create or merge reference panels. The Reference Panel Creation task (B, and orange color process in A) converts a data set of unphased genotypes into a reference panel, producing the EPIGEN-5M Reference Panel of haplotypes from the EPIGEN-5M data set. The Merge Reference Panels task (C, and pink color process in A) produces combinations of two different panels using IMPUTE2 software, generating the EPIGEN-5M+1KGP Reference Panel. The imputation process itself consists of three main tasks: pre-phasing, haplotype phase inference, and imputation. The pre-phasing task (D, and green color processes in A) performs strand alignment between target and reference panel using software SHAPEIT2, PLINK, and the scripting language AWK. Haplotype phase inference task (yellow color processes in A) of the target data set uses the methodology implemented in the software SHAPEIT2, generating .haps and .sample files (target data set aligned and phased with the Reference Panel). The latter files serve as input for the imputation task (red color processes in A) conducted with software IMPUTE2, following the “best practices” guidelines in the software documentation.