Skip to main content
Croatian Medical Journal logoLink to Croatian Medical Journal
. 2015 Oct;56(5):490–492. doi: 10.3325/cmj.2015.56.490

How much precision in reporting statistics is enough?

Farrokh Habibzadeh 1,2, Parham Habibzadeh 3
PMCID: PMC4679338  PMID: 26526886

How much precision in reporting statistics is enough?

Depending on the accuracy of the tools we employ in our research, each variable is measured within a certain degree of precision. For example, in most clinical studies on adults, age is measured in years. Generally, measuring the age with more accuracy in such studies is neither necessary nor of any particular importance. However, we might measure blood pH in the same study with two or even three digits after the decimal point because minute changes in blood pH are associated with serious clinical implications. Statistical software programs commonly used in the analysis of research data, however, calculate the results with a predefined precision, say, three digits after the decimal point, no matter how accurately the raw data were measured. Therefore, the software would report the mean of both of the mentioned variables, age and pH, with three digits after the decimal point.

The question arises: how should we report these statistics in scientific articles? Apparently, there is no consensus on this issue. For example, some references suggest that in reporting statistics (eg, means and standard deviations [SDs]) not to use precisions higher than the accuracy of the measured data (1); many researchers recommend to use only one decimal place more than the precision used to measure the variable (2,3); and, some mention that although means should not be reported to no more than one decimal place more than that of the raw data, SDs may need to be reported with an extra decimal place (4). Considering the existing controversy and the importance of this issue, in this commentary, we try to provide a reasonable answer to this question.

Suppose that variable x is measured with precision of α and reported as Inline graphic. Then, we can write:

Inline graphic (Eq. 1)

where,

Inline graphic (Eq. 2)

Then,

Inline graphic

which yields,

Inline graphic

and,

Inline graphic

But, considering Eq. 2, we have:

Inline graphic

and

Inline graphic

Thus,

Inline graphic

which means that the precision of measurement of the mean, Inline graphic, is typically expected to be near zero because errors in the measurements are presumably random – some are positive and some are negative. However, the absolute error in general would be ≤α. Therefore, the mean value cannot be reported with a precision higher than that used in the measurement of the raw data.

For the variance (SD2), beginning with Eq. 1, we will have:

Inline graphic (Eq. 3)

But,

Inline graphic

The second term in the squared bracket is negligible and the above equation then becomes:

Inline graphic

Combining this equation with Eq. 3, yields:

Inline graphic

Then,

Inline graphic

The most probable error in the calculation of the variance (SD2) would be:

Inline graphic

But, theoretically it can be as high as

Inline graphic

This value is 2α2 when the sample size is only two; for large sample sizes, however, it would be almost α2. Considering the amount of variability in the variance (SD2), the precision of the SD is therefore no more than α. As a result, it seems reasonable to conclude that SDs should also not be reported with a precision more than the accuracy of the measured raw data.

Example

Suppose that we measured serum total bilirubin of 10 newborn patients with hyperbilirubinemia. In a clinical laboratory, total bilirubin is normally measured with a precision of ±0.05 mg/dL (one digit after the decimal point). For example, all values of total bilirubin between 5.35 and 5.44 mg/dL would be recorded as 5.4 mg/dL. Assuming the second column of Table 1 (measured values) is our readings, columns 3 to 5 would be possible more accurate values for the measured bilirubin levels. Considering the accuracy in the measurement of total bilirubin in a clinical laboratory (±0.05 mg/dL), all these data sets (Table 1: columns 2 to 5) are practically identical (to one digit after the decimal point).

Table 1.

Serum total bilirubin of 10 newborns measured with a precision of ±0.05 mg/dL


Measured values Three possible values
3.2
3.22
3.23
3.20
3.1
3.14
3.14
3.13
4.5
4.45
4.51
4.47
8.2
8.23
8.21
8.19
9.3
9.34
9.27
9.28
11.7
11.73
11.65
11.65
10.0
10.01
10.02
9.99
10.8
10.84
10.79
10.78
7.1
7.14
7.13
7.09
6.8
6.84
6.82
6.82
Mean (standard deviation) reported with ±0.05 mg/dL (One digit after the decimal point)
7.5 (3.1)
7.5 (3.1)
7.5 (3.1)
7.5 (3.1)
Mean (standard deviation) with reported ±0.005 mg/dL (Two digits after the decimal point) 7.47 (3.09) 7.49 (3.10) 7.48 (3.07) 7.46 (3.08)

Because our measurement precision was ±0.05 mg/dL (one digit after the decimal point), according to what we found above, the precision to be used for reporting mean and SD should also be ±0.05 mg/dL (one digit after the decimal point). The means and SDs reported for all these practically similar data sets are not different if they are reported with the same accuracy we used to measure the raw data (7.5 [SD 3.1] mg/dL). However, if we report the mean and SD with more precision than the accuracy we used to measure the raw data (eg, two digits after the decimal point) we have a mean of 7.47 (SD 3.09) mg/dL, which is different for the real means and SDs of other possible data sets (Table 1).

The precision for reporting of each statistic depends on how that statistic is derived. As an example, if the precision of a measurement is ±α (say ±0.05, one digit after the decimal point), while we should report mean and SD with the same precision, we need to report the variance (SD2) with two digits after the decimal point (α2 = 0.0025, assuming a large sample size). Because in the calculation of all percentiles (including 25th, 50th [median], and 75th percentiles) we use linear calculations, all percentiles (including the interquartile range [IQR]) should be reported with a precision not higher than the measurement precision (like reporting mean and SD).

Reporting statistics with more than necessary precisions would be misleading (5,6). The number of decimal places to be reported for the mean, SD, median, and IQR in scientific reports should not exceed that of the precision of the measurement in the raw data.

References

  • 1.Priebe HJ. The results. In: Hall GM, editor. How to write a paper. 3rd ed. London: BMJ Publishing Group; 2003. p. 22-35. [Google Scholar]
  • 2.Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia: American College of Physicians; 2006. [Google Scholar]
  • 3.Peat J, Elliot E, Baur L, Keena V. Scientific writing: easy when you know how. London: BMJ Publishing Group; 2002. [Google Scholar]
  • 4.Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. BMJ. 1983;286:1489–93. doi: 10.1136/bmj.286.6376.1489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Karhu D, Vanzieleghem M. Significance of digits in scientific research. AMWA J. 2013;28:58–60. [Google Scholar]
  • 6.Phillips CV, LaPole LM. Quantifying errors without random sampling. BMC Med Res Methodol. 2003;3:9. doi: 10.1186/1471-2288-3-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Croatian Medical Journal are provided here courtesy of Medicinska Naklada

RESOURCES