. 2019 Oct 21;16(12):1247–1253. doi: 10.1038/s41592-019-0612-7

Table 1.

Comparison of performance of the top three methodologies

Team	Core model	Competition score	Average F1	Recall at 0.7 IoU (%)	Missed at 0.7 IoU (%)	Extra at 0.7 IoU (%)
[ods.ai] topcoders	32× U-Net/FPN	0.6316	0.7120	77.62	22.38	14.55
Jacobkie	1× FC-FPN	0.6147	0.6987	69.14	30.86	15.04
Deep Retina	1× Mask-RCNN	0.6141	0.7008	68.07	31.93	10.90
CellProfiler^a	-	0.5281	0.6280	59.35	40.65	39.55

Rows show information about each method and columns show performance metrics. Core model, type of machine-learning algorithm used to solve the task, with the number indicating how many neural networks were used in the solution. The names of neural networks are explained in the main text. Competition score, metric used during the competition to rank participants in the scoreboard (https://www.kaggle.com/c/data-science-bowl-2018#evaluation). The rest of the performance metrics were computed offline after the competition ended for analysis purposes only. Average F1 is the accuracy metric closely related to the official score, which treats the segmentation problem as a binary decision problem (correctly segmented or not) for each object. The average F1 score was computed at different IoU thresholds between target masks and estimated segmentations and then averaged across all thresholds. By setting a single IoU threshold, we could count how many objects were correctly segmented (true positives or Recall at 0.7 IoU), how many were missed (false negatives or Missed at 0.7 IoU) and how many false objects were introduced (false positives or Extra at 0.7 IoU). ^aNote that the CellProfiler reference segmentations were generated with a different experimental protocol involving manual adjustment of pipelines for five image types in the test set. More details are provided in the Methods section.