Table 5.
Scale | Twitter search | Twitter filtered stream | Twitter sampled stream | Twitter search vs. Twitter filtered stream | Twitter search vs. Twitter sampled stream | Twitter filtered stream vs. Twitter sampled stream | Overlapping across the three datasets |
---|---|---|---|---|---|---|---|
Comparison of all tweets of each dataset | |||||||
Minutea | 52.3% (45.1%, 59.6%) | 48.2% (44.7%, 51.6%) | 0.8% (0.7%, 0.8%) | 28.6% (23.4%, 33.3%) | 0.4% (0.3%, 0.5%) | 0.6% (0.6%, 0.7%) | 0.4% (0.3%,0.4%) |
Minuteb | 72.7% (49.7%, 95.7%) | 64.5% (31.8%, 97.2%) | 1.6% (1.3%, 1.9%) | 37.7% (23.1%, 52.4%) | 0.6% (0.2%, 1.0%) | 0.6% (0.5%, 0.8%) | 0.6% (0.1%, 1.0%) |
Hourb | 72.9% (51.2%, 94.6%) | 54.7% (17.8, 91.6%) | 1.8% (0.9%, 2.7%) | 28.4% (9.0%, 47.8%) | 0.5% (0.3%, 0.7%) | 0.6% (0.4%, 0.7%) | 0.4% (0.1%, 0.7%) |
Dayb | 70.5% (60.1%, 80.9%) | 53.4% (28.2%, 78.6%) | 1.9% (1.0%, 2.9%) | 25.6% (8.4%, 42.8%) | 0.5% (0.3%, 0.7%) | 0.7% (0.5%, 0.7%) | 0.4% (0.1%, 0.7%) |
Weekb | 70.4% (65.6%, 75.2%) | 54.0% (37.6, 70.4%) | 1.9% (1.3%, 2.5%) | 27.5% (16%, 39%) | 0.5% (0.2%, 0.7%) | 0.6% (0.4%, 0.7%) | 0.4% (0.2%, 0.5%) |
Comparison of the tweets with the top 10 keywords of each dataset | |||||||
Minutea | 57.1% (53.4%, 60.9%) | 52.1% (44.6% 59.5%) | 1.0% (0.9%, 1.0%) | 31.9% (26.3%, 37.6%) | 0.5% (0.4%, 0.6%) | 0.9% (0.9%, 1.0%) | 0.5% (0.4%, 0.6%) |
Minuteb | 74.4% (51.7%, 97.1%) | 69.4% (42.7%, 96.1%) | 1.2% (0.6%, 2.4%) | 44.1% (3.1%, 85.4%) | 0.6% (0, 1.0%) | 1.1% (0.5%, 1.8%) | 0.5% (0, 0.1%) |
Hourb | 77.4% (72.5%, 82.2%) | 52.9% (47.9%, 58.0%) | 1.3% (1.2%, 1.5%) | 30.3% (26.4%, 34.3%) | 0.5% (0.4%, 0.5%) | 0.8% (0.8%, 0.9%) | 0.3% (0.3%, 0.4%) |
Dayb | 68.8% (42.5%, 95.1%) | 58.4% (49.6%, 67.2%) | 1.4% (0.7%, 2.1%) | 27.7% (15.3%, 40.1%) | 0.5% (0.3%, 0.7%) | 0.8% (0.5%, 1.6%) | 0.3% (0, 0.5%) |
Weekb | 71.8% (57.0%, 86.6%) | 57.5% (50.1%, 64.9%) | 1.3% (0.7%, 1.9%) | 28.6%s (17.6%, 36.8%) | 0.5% (0.4%, 0.6%) | 0.7% (0.6%, 0.9%) | 0.3% (0.2%, 0.4%) |
The denominator is the “Twitter full archive” dataset;
The denominator is the combination of “Twitter search”, “Twitter filtered stream”, and “Twitter sampled stream” datasets.