Figure 2.

Clustering 1.1 billion taxi locations in New York City. This dataset contains 1 133 769 628 two-dimensional GPS locations (see Methods). (a) Visualization of WFC and k-means results. The cluster numbers
were set to match those identified by WFC. (b) Running time and usability of clustering algorithms with different dataset sizes using centralized computing. Different dataset sizes are obtained by slicing dataset with changing time windows (see Methods). WFC (Total) and WFC (Ave.) represent the total and average per-scale running times of WFC respectively. As dataset size increases, more and more methods fail computationally, which are not plotted. (c) Running times of WFC and k-means using distributed computing. The results were computed by 10 runs of each algorithm, and error bars indicate the standard error of the mean.