Skip to main content
. 2019 Nov 20;20:246. doi: 10.1186/s13059-019-1828-7

Table 2.

Glossary. Here, positive (P) or negative (N) describes the SV detection (or SV calling), and true (T) or false (F) describes if the calling was correct. Thus, SVs are true positive (TP) if they are called or false negatives (FN) if they are not called but present in the sample. Conversely, SVs that are not in the sample are true negatives (TN) if they are not called or false positives (FP) if they are called

Word Definition
Accuracy Proportion of correctly identified events (T) to the overall events: (TP + TN)/(TP + TN + FP + FN).
Breakpoints Positions on the genome denoting the start and end of SVs relative to the reference genome.
Contigs Contiguous sequence stretches assembled from reads.
De Bruijn graph Directed graph consisting of nodes with exactly n incoming and n outgoing edges. In genome assemblies, a de Bruijn graph is built where the nodes are k-mers (sequences of length k) and the edges correspond to the overlap on k − 1 bases between nodes.
String graph-based assembly Similar method to De Bruijn graph-based assembly, but in this case, the overlaps between all read pairs (instead of k-mers) are computed to construct a string graph based on the overlaps.
Insert size The distance between the two paired-end reads.
Overhang Portion of a mapped read that cannot be aligned and thus could indicate a structural variation.
Phasing The identification of two or more heterozygous variations are co-occurring on the same or different DNA molecule.
Precision (or positive predictive value) Proportion of predictions (FP + TP) that are correct (TP).
Recall (or sensitivity or true-positive rate) Proportion of the total positives (FN + TP) that were correctly identified (TP).
Scaffold Connected contiguous sequence stretches, with unresolved sequence stretches in between.
Split reads Reads containing parts that map in different loci on the reference genome. They are found by splitting the read in sub-segments, align individually each sub-segment, and then grouping sub-fragments from one read.
Tandem sequence A specific type of repetitive region that was repeated directly adjacent to each other.