Skip to main content
. 2019 Jul;29(7):1134–1143. doi: 10.1101/gr.245928.118

Figure 1.

Figure 1.

Overview of the CN-Learn pipeline. The CN-Learn pipeline consists of preprocessing steps (Stages 1 and 2), followed by building the classifier using training data and discriminatory features, and finally running the classifier on the test data. The complete pipeline is outlined as follows. Stage 1: CNV predictions were made using four exome-based CNV callers. Although CANOES, CODEX, CLAMMS, and XHMM were used in this study, a generic pipeline can be constructed with a different set or number of callers. Breakpoints of overlapping calls from multiple callers were then resolved. Stage 2: Breakpoint-resolved CNVs were labeled as “True” or “False” based on the overlap with “gold standard” calls and subsequently used to train CN-Learn. Stage 3: Caller-specific and genomic features were extracted for the labeled CNVs in the training and testing sets. Stage 4: CN-Learn was trained as a Random Forest classifier using the extracted features of the CNVs in the training set to make predictions on the CNVs from the testing set.