Accuracy |
Proportion of correctly identified events (T) to the overall events: (TP + TN)/(TP + TN + FP + FN). |
Breakpoints |
Positions on the genome denoting the start and end of SVs relative to the reference genome. |
Contigs |
Contiguous sequence stretches assembled from reads. |
De Bruijn graph |
Directed graph consisting of nodes with exactly n incoming and n outgoing edges. In genome assemblies, a de Bruijn graph is built where the nodes are k-mers (sequences of length k) and the edges correspond to the overlap on k − 1 bases between nodes. |
String graph-based assembly |
Similar method to De Bruijn graph-based assembly, but in this case, the overlaps between all read pairs (instead of k-mers) are computed to construct a string graph based on the overlaps. |
Insert size |
The distance between the two paired-end reads. |
Overhang |
Portion of a mapped read that cannot be aligned and thus could indicate a structural variation. |
Phasing |
The identification of two or more heterozygous variations are co-occurring on the same or different DNA molecule. |
Precision (or positive predictive value) |
Proportion of predictions (FP + TP) that are correct (TP). |
Recall (or sensitivity or true-positive rate) |
Proportion of the total positives (FN + TP) that were correctly identified (TP). |
Scaffold |
Connected contiguous sequence stretches, with unresolved sequence stretches in between. |
Split reads |
Reads containing parts that map in different loci on the reference genome. They are found by splitting the read in sub-segments, align individually each sub-segment, and then grouping sub-fragments from one read. |
Tandem sequence |
A specific type of repetitive region that was repeated directly adjacent to each other. |