Table 2.
Formulas for the various similarity and distance metrics
| Distance metric | Formula for continuous variables a | Formula for dichotomous variables a |
|---|---|---|
| Manhattan distance | D A,B = a + b − 2c | |
| Euclidean distance | ||
| Cosine coefficient | ||
| Dice coefficient | S A,B = 2c/[a + b] | |
| Tanimoto coefficient | S A,B = c/[a + b − c] | |
| Soergel distanceb | ||
| Substructure similarity | See Ref [24] | |
| Superstructure similarity | See Ref [25] | |
aS denotes similarities, while D denotes distances (according to the more commonly used formula for the given metric). Note that distances and similarities can be converted to one another using Equation 1. x jA means the j-th feature of molecule A. a is the number of on bits in molecule A, b is number of on bits in molecule B, while c is the number of bits that are on in both molecules.
bThe Soergel distance is the complement of the Tanimoto coefficient.