Table 1.
Description of unrelated haplotyping programs, divided into four classes based on method.
Program name | Algorithm | Outputa | Missing datab | Assumptionsc | Key features | Limitations | MAX subjects, loci and type | Platform | Ref.d |
---|---|---|---|---|---|---|---|---|---|
Parsimony methods | |||||||||
1. Simple parsimony | |||||||||
HAPAR | Parsimony | HA | No | None | Overcomes limitations of HAPINFREX | May be susceptible to HWE departures | Practical limit, biallelic | PC/UNIX | [39] |
Increasing sample size improves accuracy |
|||||||||
HAPINFERX | Clark's | HA | No | None | Intuitive method, fast | May fail to start | Practical limit, biallelic/multiallelic |
UNIX | [37] |
Reduced number of haplotypes |
Sensitive to data order |
||||||||
No limit on number of loci |
Unstable and erroneous estimates |
||||||||
2. Phylogeny | |||||||||
BPPH | IP | HA | No | IP | Similar to HAPH | User interface | Practical limit, biallelic | MAC | [45] |
Speed | |||||||||
DPPH | PP | HA | No | PP | Handles large datasets | Theoretical | Practical limit, biallelic | MAC | [40,43] |
Speed | Strict population assumptions | ||||||||
GPPH | PP | HA | No | PP | Handles large datasets | Theoretical | Practical limit, biallelic | MAC/PC/UNIX | [40,42] |
Speed | Strict population assumptions | ||||||||
HAPH | IP | HA/HF | Yes | HWE, IP | Predicts haplotype blocks | No probability for haplotype assignments | Max 500 loci, Practical limit biallelic | Web-based | [44] |
Constructs haplotypes within blocks |
|||||||||
Identifies block structure |
|||||||||
Web-based | |||||||||
Likelihood methods | |||||||||
1. Maximum likelihood | |||||||||
Arlequin v2.0 | EM | HA/HF | No | HWE | Includes numerous population genetic analysis tools | EM issues | EM Practical Limits, biallelic/multiallelic | JRE on MAC/PC/UNIX | [89] |
CHAPLIN | ECM | HF | Yes | HWE | Graphical interface | ECM algorithm needs to be compared with standard EM methods | Practical limits, biallelic/multiallelic | PC | [91] |
Association tests | |||||||||
HWE assumption relaxed in case sample |
|||||||||
EH | EM | HF | No | HWE | Estimates haplotype frequency | EM issues | No Max, 3-4 practical max, biallelic/multiallelic | PC | [85,104] |
Compares case-control HF under different assumptions |
Must specify mode of inheritance and penetrance of disease |
||||||||
EHPLUS | EM | HF | No | HWE | Improves EH, more loci and polymorphic markers |
Long run times for permutation calculations |
Max 5 loci, 15 alleles in analysis |
PC/UNIX | [84] |
Incorporates model-free analysis |
|||||||||
EM-DeCODER | EM | HA/HF | No | HWE | Program with standard EM algorithm |
EM issues | Max 15 loci, biallelic |
UNIX | [57] |
FASTEHPLUS | EM | HF | No | HWE | Similar to EHPLUS, with speed improvements |
EM issues | Max 5 loci, 15 alleles in analysis |
PC/UNIX | [105] |
GENECOUNTING | EM | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes |
Missing data limited to biallelic loci |
10-15 loci practical limit, biallelic/ multiallelic |
PC/UNIX | [106] |
Compares global and specific haplotypes between groups |
EM issues | ||||||||
GCHAP | EM | HA/HF | YES | HWE | Haplotypes with zero likelihood dropped to improve speed and accuracy |
EM issues | 20 loci practical limit, biallelic |
JRE on PC/UNIX |
[107,108] |
Similar to SNPHAP | |||||||||
GS-EM | EM | HA/HF | Yes | HWE | Includes algorithm for assigning probability to genotype calls from several genotyping methods |
EM issues | Practical limit, biallelic |
Web-based | [73] |
Haplotypes constructed using assigned genotypes probability | Limited to biallelic SNPs | ||||||||
Web-based | |||||||||
HAPZ | EH | HA/HF | Yes | HWE | Modified version of SNPHAP that accommodates multiallelic loci |
EM issues | Practical limit, biallelic/multiallelic | PC/UNIX | [106] |
HAPMAX | MLE | HF | No | HWE | Ease of use | Accommodates a limited number of SNPs |
8 loci, biallelic | PC | [109] |
Interface | |||||||||
HAPLOH | EM | HF | Yes | HWE | Handles some missing data | EM issues | 10 loci, 40 alleles max, biallelic/multiallelic | UNIX | [47] |
Utilises pedigree data, if available | |||||||||
Calculates standard error |
|||||||||
HAPLOSCOPE | EM/MCMC | † | † | † | Platform program, incorporates SNPHAP and PHASE v1.0 | See individual programs for limitations/features | † | UNIX/Windows | [110] |
Facilitates comparison/testing |
|||||||||
Graphical interface, identifies tagging SNPs and LD blocks |
|||||||||
HAPLOVIEW | EM+PL | HA/HF | Yes | HWE | Calculates pairwise LD | EM issues | 100 s, practical limit, biallelic | JRE on MAC/PC/UNIX | [56] |
Checks for recombination |
|||||||||
Identifies tagging SNPs |
|||||||||
Accepts pedigree and unrelated genotype data |
|||||||||
HAPLO.STATS | EM | HA/HF | Yes | HWE | Incorporates method similar to SNPHAP, with user inputs |
Requires knowledge of S-Plus 6.0 or R |
Practical limit, biallelic/ multiallelic |
S-PLUS 6.0 on UNIX/R on UNIX & PC |
[86] |
Separate programs that: |
EM issues | ||||||||
(1) assign haplotypes with posterior probability of assignments |
|||||||||
(2) allow linear regression for trait to haplotype analysis |
|||||||||
(3) calculates score statistic for haplotype phenotype association |
|||||||||
HIT | EM/MCMC/ MC+PL |
† | † | † | Platform program, incorporates SNPHAP and PHASE v1.0 |
See individual programs for limitations/ features |
† | * | [111] |
Facilitates comparison |
|||||||||
Graphical interface, identifies tagging SNPs and LD blocks |
|||||||||
HPLUS | EM+EE+PL | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes |
Requires Matlab | 100 loci, biallelic |
MATLAB on PC/ UNIX |
[55,83] |
Compares haplotype frequencies between groups, adjusts for covariates |
EM issues | ||||||||
Utilises pedigree data, if available |
|||||||||
LDSUPPORT | EM | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes |
EM issues | * | UNIX | [29,112] |
Identifies LD blocks for haplotype reconstruction |
|||||||||
Examines association with disease, automation speeds process |
|||||||||
LOGINSERM ESTIHAPLO |
EM | HA/HF | Yes | HWE | Program uses ML method to infer haplotypes for individuals with missing data |
EM issues | Practical limit, biallelic/multiallelic |
PC/ UNIX |
[80] |
Offers option to exclude individuals with missing data |
|||||||||
MLHAPFRE | EM | HF | Yes | HWE | Performance improves with presence of LD | Incorporated into Arlequin | 16 loci, biallelic | JRE on Mac/PC/UNIX | [48] |
Performs well with large sample size |
EM issues | ||||||||
MLOCUS | EM | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes | EM issues | 11 loci, biallelic/multiallelic | PC | [46,113] |
Notes observed vs. inferred haplotypes |
|||||||||
Calculates pairwise LD | |||||||||
OSLEM | EM | Yes | No | HWE | Modified EM algorithm that runs 2 × faster | EM issues | Practical limit, biallelic | Web-based | [114] |
PL-EM | EM+PL | HA/HF | Yes | HWE | Combines PL with EM | EM issues | 100 s, practical limit, biallelic | PC/UNIX | [54] |
EM-based version of HAPLOTYPER |
|||||||||
Calculates variance of haplotype frequency estimates |
|||||||||
SAS Genetics | EM | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes | Requires SAS | Practical limit, biallelic/multiallelic | SAS on PC/UNIX | [115] |
Incorporates statistical tests and procedures |
EM issues | ||||||||
SNPEM | EM | HF | No | HWE | Estimates haplotype frequency by population |
EM issues | 10 loci, biallelic | UNIX | [10] |
Compares global and specific haplotype between 2 groups |
|||||||||
SNPHAP | EM | HA/HF | Yes | HWE | Uses posterior and prior trimming to handle large number loci |
EM issues | Practical limit, biallelic |
UNIX | [52] |
Provides posterior probabilities for assigned haplotypes |
|||||||||
THESIAS | S-EM | HF | Yes | HWE | Stochastic EM avoids issues of standard EM programs |
S-EM algorithm needs to be compared with standard EM methods |
Practical limit, 20 loci, biallelic |
PC/UNIX | [53,88] |
Includes tests for haplotype-phenotype association |
|||||||||
Accommodates large sample sizes |
|||||||||
WHAP | EM | † | † | † | Uses haplotype output from SNPHAP for association testing |
EM issues | † | PC/UNIX | [116] |
Allows weighted association analysis |
Requires separate haplotyping program |
||||||||
Zaykin et al. | EM | HF | No | HWE | Program on analysis of haplotype-phenotype association |
EM issues | Practical limit, biallelic/multiallelic |
PC/UNIX | [82] |
Subjects with missing data ignored |
|||||||||
Zou and Zhao | MLE/EM | HF | Yes | HWE | Adjust haplotype frequency estimates for genotyping error |
Assumes genotyping errors are random |
Practical limits, biallelic/multiallelic |
* | [68] |
Program also works for nuclear families |
Assumes error rates are known |
||||||||
3locus.PAS | EM | HF | Yes | HWE | Handles some missing data |
EM issues | 3 loci, biallelic/ multiallelic |
PC/UNIX | [46] |
Various tests available |
|||||||||
Improves with increasing sample size |
|||||||||
2. Simple Bayesian | |||||||||
HAPLOTYPER | MC+PL | HA/HF | Yes | HWE | Uses PL algorithm to construct haplotypes with many loci |
Long run times | 256 max, biallelic |
UNIX | [57] |
Provides posterior probabilities for assigned haplotypes |
Posterior probabilities may be difficult to interpret |
||||||||
HAPLOREC | MC-VL | HA/HF | Yes | HWE | Uses variable length chain based on maximising LD |
Restarts avoid non-global optimum |
Practical limit, biallelic |
Java virtual machine, v1.4 or newer |
[62] |
Handles large number loci |
|||||||||
3. Coalescent-based Bayesiane | |||||||||
Arlequin v3.0 | ELB | HA/HF | No | Adaptive window |
Includes numerous population genetics analyses |
Long run times | 1,000 s, biallelic/ multiallelic |
JRE on LINUX/ PC/Mac |
[60,89] |
Handles recombination | |||||||||
PHASE v2.0 | MCMC+PL | HA/HF | Yes | Coalescent/ HWE |
Improved run time | Departure for coalescent model may impact performance |
Practical limit, biallelic/ multiallelic |
PC/MAC/ UNIX |
[59] |
Compares haplotype frequency between groups |
Posterior probabilities may be difficult to interpret |
||||||||
Handles recombination |
|||||||||
Provides posterior probabilities for assigned haplotypes |
|||||||||
PHASE v1.0 | MCMC | HA/HF | No | Coalescent/ HWE |
Incorporates pop-genetics and coalescence ideas |
Departures for coalescent model may impact performance |
Practical limit, biallelic/ multiallelic |
UNIX | [51] |
Incorporates known phase and trios pedigrees into analysis |
Slow run times | ||||||||
Provides posterior probabilities for assigned haplotypes |
Posterior probabilities may be difficult to interpret |
||||||||
SLHAP v1.0 | MCMC | HA/HF | Yes | Neutral coalescent/ HWE |
Similar to PHASE v1.0 |
Departures for coalescent model may impact performance |
Practical limit, biallelic/multiallelic |
UNIX | [58] |
Missing data | |||||||||
Improved run time |
a Program haplotype output, individual assignment, frequency estimates or both.
b Ability of program to accept missing data.
c Program assumptions.
d List of references.
e Programs in this section make assumptions based on or draw inference from coalescent model.
*Could not determine from available data.
†See incorporated programs for features and limitations.
EE: Estimating equation; ECM: Expectation conditional maximisation algorithm; ELB: Excoffier-Laval-Balding algorithm, Bayesian; EM: Expectation maximisation algorithm; EM issues: May be sensitive to HWE departures, long run times, and non-global max (requiring multiple restarts); HF: Haplotype frequency estimate; HA: Individual haplotype assignment; HWE: Hardy-Weinberg equilibrium; IP: Imperfect phylogeny-based method; JRE: Java runtime environment; LD: Linkage disequilibrium; MAC: Program runs on Apple computer; MC: Monte Carlo algorithm, Bayesian algorithm; MCMC: Markov Chain Monte Carlo algorithm, Bayesian algorithm; MC-VL: Monte Carlo-variable length chain algorithm, Bayesian Algorithm; MLE: Maximum likelihood estimation algorithm; PC: IBM compatible personal computer; PL: Partition ligation algorithm; PP: Perfect phylogeny-based method; Practical Limit: program has no upper limit on number of markers and/or subjects, however computational and practical considerations limit this value; S-EM: Stochastic EM algorithm; UNIX: Runs on Unix operating system, including Linux, FORTRAN, Solaris and others.