Table 2.
Glossary. Here, positive (P) or negative (N) describes the SV detection (or SV calling), and true (T) or false (F) describes if the calling was correct. Thus, SVs are true positive (TP) if they are called or false negatives (FN) if they are not called but present in the sample. Conversely, SVs that are not in the sample are true negatives (TN) if they are not called or false positives (FP) if they are called
Word | Definition |
---|---|
Accuracy | Proportion of correctly identified events (T) to the overall events: (TP + TN)/(TP + TN + FP + FN). |
Breakpoints | Positions on the genome denoting the start and end of SVs relative to the reference genome. |
Contigs | Contiguous sequence stretches assembled from reads. |
De Bruijn graph | Directed graph consisting of nodes with exactly n incoming and n outgoing edges. In genome assemblies, a de Bruijn graph is built where the nodes are k-mers (sequences of length k) and the edges correspond to the overlap on k − 1 bases between nodes. |
String graph-based assembly | Similar method to De Bruijn graph-based assembly, but in this case, the overlaps between all read pairs (instead of k-mers) are computed to construct a string graph based on the overlaps. |
Insert size | The distance between the two paired-end reads. |
Overhang | Portion of a mapped read that cannot be aligned and thus could indicate a structural variation. |
Phasing | The identification of two or more heterozygous variations are co-occurring on the same or different DNA molecule. |
Precision (or positive predictive value) | Proportion of predictions (FP + TP) that are correct (TP). |
Recall (or sensitivity or true-positive rate) | Proportion of the total positives (FN + TP) that were correctly identified (TP). |
Scaffold | Connected contiguous sequence stretches, with unresolved sequence stretches in between. |
Split reads | Reads containing parts that map in different loci on the reference genome. They are found by splitting the read in sub-segments, align individually each sub-segment, and then grouping sub-fragments from one read. |
Tandem sequence | A specific type of repetitive region that was repeated directly adjacent to each other. |