Overview of the data set’s complexity. The data set used here mostly consisted of deletion events, followed by insertion events, complex events, and finally exact matches (i.e., error-free repair). A cursory examination shows that in each event there is a diverse number of repair events (based on size of insertion or deletion) and the sizes tend to follow a normal distribution from smallest size to largest (although the scale here is logarithmic and, consequently, is not immediately obvious as normal).