Skip to main content
. 2020 May 20;12:35. doi: 10.1186/s13321-020-00439-2

Fig. 6.

Fig. 6

Chemical space coverage by training set S and test sets TCP and TMC. TMC data set consists of 40 HS compounds and 1200 ES compounds, from S and TCP data sets random samples of 1240 compounds were generated. Each compound was encoded by 1024 bits long ECFP4 fingerprint. The dimensionality of the input space was reduced by SVD to 500 components that explain 88% of the variance in the data