Table 1.
MERNIS | USA | ADULT | HDV | MIDUS | ||
---|---|---|---|---|---|---|
Corpus | n | 8,820,049 | 3,061,692 | 32,561 | 8403 | 7108 |
c | 10 | 40 | 50 | 50 | 60 | |
[min Ξ, max Ξ] | [0.087, 0.844] | [0.000, 0.961] | [0.000, 0.794] | [0.002, 0.941] | [0.052, 0.944] | |
Sampling fraction | 100% | 0.029 ± 0.019 | 0.028 ± 0.026 | 0.018 ± 0.016 | 0.006 ± 0.009 | 0.018 ± 0.014 |
10% | 0.030 ± 0.019 | 0.028 ± 0.016 | 0.022 ± 0.020 | 0.011 ± 0.009 | 0.035 ± 0.044 | |
5% | 0.029 ± 0.019 | 0.027 ± 0.016 | 0.027 ± 0.023 | 0.015 ± 0.012 | 0.037 ± 0.055 | |
1% | 0.029 ± 0.019 | 0.029 ± 0.015 | 0.027 ± 0.014 | 0.045 ± 0.050 | 0.055 ± 0.079 | |
0.5% | 0.028 ± 0.019 | 0.029 ± 0.015 | 0.048 ± 0.039 | |||
0.1% | 0.026 ± 0.017 | 0.058 ± 0.037 |
Our model correctly estimates population uniqueness even when only a small to very small fraction of the population is available. n denotes the population size and c the corpus size (the total number of populations considered per corpus). We do not estimate population uniqueness when the sampled dataset contains <50 records