Correction to CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes

James B Dunbar, Jr; Richard D Smith; Chao-Yie Yang; Peter Man-Un Ung; Katrina W Lexa; Nickolay A Khazanov; Jeanne A Stuckey; Shaomeng Wang; Heather A Carlson

doi:10.1021/ci200363q

. 2011 Aug 19;51(9):2146. doi: 10.1021/ci200363q

Correction to CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes

James B Dunbar Jr ^✉, Richard D Smith, Chao-Yie Yang, Peter Man-Un Ung, Katrina W Lexa, Nickolay A Khazanov, Jeanne A Stuckey, Shaomeng Wang, Heather A Carlson ^✉

PMCID: PMC3180240

This Erratum is to declare that the values reported for R² in the paper are actually Pearson R values. The wrong column of data in a spreadsheet was used inadvertently. All correlation values in the paper are correct, just mislabeled with the squared superscript. One of the major conclusions noted in the abstract and discussed in the “Strengths and Weaknesses” Section should read:

“Inherent experimental error limits the possible correlation between scores and measured affinity; Pearson R is limited to ∼0.91 (Pearson R² ∼0.83) when fitting to the data set without over parameterizing. Pearson R is limited to ∼0.83 (Pearson R² ∼0.70) when scoring the data set with a method trained on outside data.”

For clarity, the Pearson R and R² are given in Table 1 below for all the theoretical cases posed. It corrects the correlation coefficients in Figure 3 and in the discussion of signal over noise in the “Strengths and Weaknesses” section.

Table 1. Correlation Metrics when Random Error is Added to the 343 Affinity Data of the CSAR-NRC Data Set^a.

	error with σ = 0.5 log K	error with σ = 1.0 log K	error with σ = 2.0 log K	error with σ = 3.0 log K
Random Error in One Coordinate (Ideal vs Lab Case)
Pearson R	0.976	0.913	0.744	0.590
(Pearson R)²	0.952	0.834	0.554	0.348
Random Error in Both Coordinates (Lab vs Scoring Case)
Pearson R	0.952	0.835	0.553	0.355
(Pearson R)²	0.907	0.696	0.305	0.130

Open in a new tab

^a

Values are the medians of 100 generations of random error.

It should be noted that our use of R² is based on squaring the Pearson value, not based on a calculation of the coefficient of determination (also called R²). The coefficient of determination measures the one-to-one correspondence between two values, requiring a slope of 1 and an intercept at 0 rather than least-squares-fit values.

Acknowledgments

We thank Christian Kramer of Novartis Pharma AG for pointing out that the R² values in the paper were likely R and for very stimulating discussions regarding Pearson R² versus the coefficient of determination.

Funding Statement

National Institutes of Health, United States

PERMALINK

Correction to CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes

James B Dunbar Jr

Richard D Smith

Chao-Yie Yang

Peter Man-Un Ung

Katrina W Lexa

Nickolay A Khazanov

Jeanne A Stuckey

Shaomeng Wang

Heather A Carlson

Table 1. Correlation Metrics when Random Error is Added to the 343 Affinity Data of the CSAR-NRC Data Set^a.

Acknowledgments

Funding Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Correction to CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes

James B Dunbar Jr

Richard D Smith

Chao-Yie Yang

Peter Man-Un Ung

Katrina W Lexa

Nickolay A Khazanov

Jeanne A Stuckey

Shaomeng Wang

Heather A Carlson

Table 1. Correlation Metrics when Random Error is Added to the 343 Affinity Data of the CSAR-NRC Data Seta.

Acknowledgments

Funding Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. Correlation Metrics when Random Error is Added to the 343 Affinity Data of the CSAR-NRC Data Set^a.