All proteins with an InterPro (Hunter et al., 2012) serine recombinase catalytic domain (IPR006119; 35,076 entries) were clustered (13,019 clusters). The mean protein length of each cluster was computed, and the distribution of these lengths is presented here as a histogram. The list of putative serine recombinases was assembled with a custom script that scanned the entire InterPro “Protein matched complete” XML flatfile (∼75 GiB uncompressed; downloaded on Feb. 14, 2014) for proteins with an IPR006119 domain (35,076 proteins found). Protein sequences were downloaded from UniProt (UniProt Consortium, 2012) and were validated via CRC64 checksum comparison with InterPro. CD-HIT (Li and Godzik, 2006) version 4.6.1 was used to perform the clustering with the following parameters: 95% identity cutoff, 95% size cutoff, five character word size. Because the smallest characterized serine integrases (A118 and U153, accession numbers Q9T193 and Q8LTD8, respectively) are both 452 residues in length, we estimate that there are at least 4,000 unique putative large serine recombinases in the InterPro database as of February 14, 2014.