. 2011 Mar;18(3):401–413. doi: 10.1089/cmb.2010.0253

Table 2.

Probabilities of Alignment Errors

	Operation	Probability
Substitute	A → C	0.31
	C → G
	G → T
	T → A
Substitute	A → T	0.31
	C → T
	G → A
	T → C
Substitute	A → G	0.31
	C → A
	G → C
	T → G
Insert	1 N	0.056
Insert	2 N's	0.0041
Insert	3 N's	0.0016
Insert	4 N's	0.00069
Insert	5 N's	0.00038
Insert	6 N's	0.00041
Insert	7 N's	0.00030
Insert	8 N's	0.00027
Insert	9 N's	0.00056
Insert	≥ 10 N's	0.0019
Insert	A	0.001225
Insert	C	0.001225
Insert	G	0.001225
Insert	T	0.001225
Delete		0.0049

The probabilities are empirically estimated by aligning Chromosome 1 fragments from GAhum against the human reference (hg18). We encode the 12 possible substitutions using three symbols by using the reference. Insertions and deletions create an additional five symbols. To encode runs of Ns, we use distinct symbols up to 10. Here we use a single symbol for 10 or more Ns to get a lower bound on the entropy of the distribution.