Metric-entropy ratio (ratio of clusters to entries in database) and fractal dimension at typical search radii for four datasets.
Metric-entropy ratio gives an estimate of the acceleration of coarse search with respect to naïve search, and as long as fractal dimension is low, coarse search should dominate total search time. NCBI’s non-redundant ‘NR’ protein and ‘NT’ nucleotide sequence databases are from June 2015. Protein Data Bank (PDB) is from July 2015. PubChem is from October 2013.
Dataset | Metric-entropy ratio | Fractal dimension |
---|---|---|
Nucleotide sequences (NCBI NT) | 7:1 | 1.5 |
Protein sequences (NCBI NR) | 5:1 | 1.6 |
Protein structure (PDB) | 10:1 | 2.5 |
Chemical structure (PubChem) | 11:1 | 0.2 |