Skip to main content
. 2018 Oct 16;19:382. doi: 10.1186/s12859-018-2407-8

Table 1.

Characteristics of the experimental datasets. n is the number of mutated positions and k is the number of residues at each position

Dataset Size of dataset n k Theoretical size of sequence space Length of protein sequence
Cyt P450 242 8 3 6561 464–466
GLP-2 31 31 2 2.147 billion 33
Enterotoxin 12 40 2 1099.5 billion 233
TNF 21 17 [2, 7, 4, 6, 2, 9, 9, 9, 9, 9, 2, 2, 2, 2, 6, 8, 7] 213.3 billion 157

The theoretical size of sequence space S is calculated as the product all k values for all mutated positions