Constructing a multiple alignment. (A) Constructing a row of the crude multiple alignment. One of the secondary sequences (e.g. sequence r) consists of two contigs. The pairwise alignments between the reference sequence and the two contigs are shown in a dot-plot format, in which the positions of each local alignment are plotted as a series of diagonal lines. For clarity, the four major local alignments are numbered and enclosed in shaded parallelograms. To construct a row in the crude multiple alignment, the local alignments are pruned so that each position in the reference sequence is aligned at most once. In this illustration, interval a-b is aligned to the reverse complement of B–A, b–c is aligned to B–C, c–d is aligned to C′–D, and e–g is aligned to E–G. This necessitates some pruning since some positions in the reference sequence are aligned more than once, e.g. the positions just before b. Extraneous matches to an improperly masked repetitive element around position f are discarded. Row r of the crude multiple alignment is constructed from the aligned intervals listed above. Gaps within a local pairwise alignment, say between a and b, result in ‘internal gaps’ in row r of the multiple alignment, which are penalized. A region between aligned segments (e.g. region z–a or d–e) is considered an ‘end-gap’ and is not penalized. Note that segment E–D of the secondary sequence appears twice in row r. (B) Refinement of the multiple alignment. One cycle of the refinement process is shown schematically. The crude multiple alignment is shown as a series of rows with thick lines representing strings of nucleotides; gaps are spaces in the rows. A subalignment between positions i and j is extracted and row r removed. The subalignment and row r are reduced by removing gaps as described in the Methods, and a new alignment is computed between the sequence in row r and the reduced subalignment (without row r). If this process improves the alignment score, then the new subalignment is spliced back into the large alignment. This process is repeated for all sub-regions where the alignment's columns have changed.