Skip to main content
. 2015 May 20;7:20. doi: 10.1186/s13321-015-0069-3

Table 2.

Formulas for the various similarity and distance metrics

Distance metric Formula for continuous variables a Formula for dichotomous variables a
Manhattan distance DA,B=j=1nxjAxjB D A,B = a + b − 2c
Euclidean distance DA,B=j=1nxjAxjB212 DA,B=a+b2c12
Cosine coefficient SA,B=j=1nxjAxjB/j=1nxjA2j=1nxjB212 SA,B=cab12
Dice coefficient SA,B=2j=1nxjAxjB/j=1nxjA2+j=1nxjB2 S A,B = 2c/[a + b]
Tanimoto coefficient SA,B=j=1nxjAxjBj=1nxjA2+j=1nxjB2j=1nxjAxjB S A,B = c/[a + b − c]
Soergel distanceb DA,B=j=1nxjAxjB/j=1nmaxxjA,xjB DA,B=1ca+bc
Substructure similarity See Ref [24]
Superstructure similarity See Ref [25]

aS denotes similarities, while D denotes distances (according to the more commonly used formula for the given metric). Note that distances and similarities can be converted to one another using Equation 1. x jA means the j-th feature of molecule A. a is the number of on bits in molecule A, b is number of on bits in molecule B, while c is the number of bits that are on in both molecules.

bThe Soergel distance is the complement of the Tanimoto coefficient.