Skip to main content
. Author manuscript; available in PMC: 2016 Jan 15.
Published in final edited form as: Proceedings VLDB Endowment. 2015 Sep;8(13):2182–2193.

Table 1.

Datasets used for testing

Name Description Size |A| |M| Views Size (MB)
Synthethic Datasets
SYN Randomly distributed, varying # distinct values 1M 50 20 1000 411
SYN*-10 Randomly distributed, 10 distinct values/dim 1M 20 1 20 21
SYN*-100 Randomly distributed, 100 distinct values/dim 1M 20 1 20 21
Real Datasets
BANK Customer Loan dataset 40K 11 7 77 6.7
DIAB Hospital data about diabetic patients 100K 11 8 88 23
AIR Airline delays dataset 6M 12 9 108 974
AIR10 Airline dataset scaled 10X 60M 12 9 108 9737
Real Datasets - User Study
CENSUS Census data 21K 10 4 40 2.7
HOUSING Housing prices 0.5K 4 10 40 <1
MOVIES Movie sales 1K 8 8 64 1.2