Skip to main content
. 2015 May 30;4:e06602. doi: 10.7554/eLife.06602

Figure 1. Schematic representation, validation and enrichment of genome-wide siRNA cell screen for machine learning approach.

(A) High-content small interfering RNA (siRNA) cell-based screen using reverse transfection of the library in media containing serum for 72 hr, followed by 24 hr serum starvation, fixation and DAPI staining. Subsequent fluorescent imaging and algorithmic analysis performed for all pooled siRNAs. To assess ciliary candidates for the positive training, we used SYSCILIA gold standard (SCGSv1) and for the negative training the human metabolome database (HMDB 3.0) as well as a manually curated housekeeping gene data set. FDR, false discovery rate. (B) Segmentation algorithm for cytoplasm and cilia detection: (1) detected nuclei from DAPI channel, (2) nuclear automated segmentation, (3) cell outline automated using cytoplasm_detection_D of the program Acapella, and (4) cilia automated detection and segmentation. Images have been modified for illustration purposes. Scale bar: 10 μm. (C) Representative images of serum-starved SEMG cells without siRNA showing basal ciliation (small green rods in EGFP channel). Red (mCherry) marks cells in S/G2/M phase of the cycle, green (EGFP) marks cilia, blue (DAPI) marks nuclei. siRNAs used as positive controls: KIF3A interferes with ciliation but not cell cycle. ACTR3 shows increased length of cilia (Kim et al., 2010). CRNKL1 implicated in cell cycle progression (Zhang et al., 1991) and showed increased mCherry nuclei and reduced ciliation. Scale bar: 10 μm. (D) Receiver operating characteristic (ROC) for the classifier, which used features from three data sources. Dashed line: theoretical random classifier. (E) Precision-recall curve for the final classifier. (F) Median value (red center bar) and interquartile ranges (blue box) box plot of the classifier scores for the corresponding number of supporting number of evidences (NOEs) in Cildb and the genes used as negative and positive training examples. The indicated contrasts were found significant(*) with a highest value of p < 1.03 × 10−4 (one-tailed Wilcoxon's Rank sum test). (G) Same as (F), limited to the NOEs from humans only. The indicated contrasts were found significant(*) with a highest value of p < 1.43 × 10−10 (one-tailed Wilcoxon's Rank sum test). See Figure 1—figure supplement 1, 2 for the prediction score on the gold standard and candidates as well as the visible improvement of the ROC curve and precision–recall curve.

DOI: http://dx.doi.org/10.7554/eLife.06602.003

Figure 1.

Figure 1—figure supplement 1. Prediction score on Gold standard and Gold standard candidates.

Figure 1—figure supplement 1.

(AC) Box plot reporting median value (red center bar) and interquartile ranges (blue box) of the classifier scores for gold standard positive and negative genes (out of bag performance, that is, for every gene the score excludes trees where the gene was used for training), also included are boxes for a set of ciliopathy candidate genes (SYSCILIA candidate genes) and genes not annotated to be ciliopathy related (Unknown), which were not used in the training. (A) Classifier based on cilia siRNA screen features only. (B) Classifier based on cilia siRNA screen and centriole siRNA screen features only. (C) Classifier including all siRNA and GTex project expression signature based features. In all cases, the median value for positive set or candidate genes differed significantly from the negative set or unknown set of genes (One-tailed Wilcoxon rank sum test).
Figure 1—figure supplement 2. Visible improvement of ROC curve and precision-recall curve.

Figure 1—figure supplement 2.

(A) ROC for classifiers trained on different partitions of the feature space (blue: final set, magenta: excluding centriole biogenesis siRNA based features, red: including only features from the whole genome siRNA screen performed in this study). The dashed black line corresponds to a theoretical random classifier. (B) As in A but showing precision-recall curve for each classifier.