Skip to main content
Dementia & Neuropsychologia logoLink to Dementia & Neuropsychologia
. 2021 Oct-Dec;15(4):480–484. doi: 10.1590/1980-57642021dn15-040009

Clock drawing test: comparison between the Pfizer and the Shulman systems

Teste do desenho do relógio: comparação dos sistemas de Pfizer e de Shulman

Daniela Bertol Graeff 1, Jéssica Maldaner Lui 1, Nathália Dal Prá Zucco 1, Ana Luisa Sant’Anna Alves 1, Cassiano Mateus Forcelini 1, Bernadete Maria Dalmolin 1
PMCID: PMC9018089  PMID: 35509798

ABSTRACT

Cognitive decline can be screened by the clock drawing test (CDT), which has several versions.

Objective:

This survey aimed to analyze the correlation between two simple methods for scoring the CDT.

Methods:

This cross-sectional study was nested in the Elo-Creati cohort from Passo Fundo, Brazil and comprised 404 subjects. Two raters underwent previous training and scored the subjects’ CDT according to both the Pfizer and Shulman systems. The inter-observer and intra-observer concordance within each method was analyzed with the Spearman’s rank correlation coefficient, as well as the concordance of the scores between the two methods. Age and scholarity were also correlated with the scores.

Results:

Most of the participants were women (93.8%) and Caucasian (84.6%), with a mean age of 66.9 (±7.8) years and a scholarity of 10.9 years (±5.6). There was significant inter-observer (Pfizer: r=0.739, p£0.001; Shulman: r=0.727, p£0.001) and intra-observer correlation (Pfizer: rater 1, r=0.628, p≤0.001; rater 2, r=0.821, p≤0.001; Shulman: rater 1, r=0.843, p≤0.001; rater 2: r=0.819; p≤0.001). Intra-observer correlation was also observed comparing Pfizer and Shulman methods (rater 1: r=0.744; p≤0.001; rater 2: r=0.702; p≤0.001). There was weak correlation of the scores with scholarity (Pfizer: r=0.283, p£0.001; Shulman: r=0.244, p£0.001) and age (Pfizer: r=-0.174, p£0.001; Shulman: r=-0.170, p£0.001). More participants were classified with decreased cognition through the Pfizer system (rater 1: 44.3 vs. 26.5%; rater 2: 42.1 vs. 16.3%; p≤0.001).

Conclusions:

For this population, our results suggest that the Pfizer system of scoring CDT is more suitable for screening cognitive decline.

Keywords: cognitive decline, screening, clock drawing test, correlation

INTRODUCTION

The number of older people, including those living with cognitive decline and dementia, is rising. 1 The majority of older adults with dementia live in low- and middle-income countries, and such a preponderance will increase over the next decades. 2 Early identification of cognitive decline is desirable to allow adequate management and improve outcomes. 3

There are several cognitive screening tools for identifying cognitive decline and dementia, but the validation of these tests in illiterate and low-educated older adults is a challenge. 4,5 One of the most simple of them is the clock drawing test (CDT), which has more than one version with different scoring methods. The 5-item score Shulman system was considered as an accurate method for the widespread use in the diagnosis of dementia, requiring a substantial understanding of its scoring system. 6 In contrast, small cross-sectional studies suggested that CDT is not sensitive enough to detect very mild dementia neither in educated people 7 nor in those with limited education, 5 even though the CDT version employed in the last study had a complicated scoring system. Moreover, additional uses for the several CDT versions have been described in terms of evaluating the progression of cognitive decline, 8 differentiating types of dementia, 9 and even classifying qualitatively the elements of CDT according to the educational level. 10

The issue about the usefulness of the CDT is far from being clarified. In this setting, this survey aimed to analyze the correlation between two simple methods for scoring the CDT in a large series of adult and elderly people, as well as their inter- and intra-observer correlations.

METHODS

This cross-sectional study was nested in the Elo-Creati (Estudo Longitudinal do Centro de Referência e Atenção ao Idoso) cohort from the urban area of Passo Fundo, RS, Brazil. This study has been accomplished by the Universidade de Passo Fundo (UPF) and comprises 404 adult and elderly subjects from the community who have been followed since 2014. The Elo-Creati cohort has the objective of accompanying their health status and promoting well-being through the intervention of professors and students from areas related to health care with experience in the application of cognitive tests. All subjects were invited to participate in this study, gave their written consent, and were evaluated between 2014 and 2015. No patient subject declined participation during the period of this research, which was conducted in accordance with the Declaration of Helsinki and was approved by the Ethical Committee of the UPF (Report Number 741.214).

Demographic and clinical data were obtained from a scheduled interview with each subject performed in a quiet and comfortable room, followed by the task of CDT with a pencil on a white paper, with the clock indicating “10 min to 2 h.” Only the final drawing was evaluated, with no concern about the speed and agility for performing it, according to the classical Pfizer and Shulman systems. For both, the higher numbers indicate better performance. The Pfizer system consists of a 4-item scale evaluation, where one point is assigned for drawing a closed circle, one point for including all 12 numbers, one point for placing the numbers in correct positions, and finally the last point for disposing of the clock-hands adequately. 7 The Shulman system implies five points for a perfect clock, four points for minor visuospatial errors, three points for inaccurate representation of 10 min to 2 h when the visuospatial organization is well done, two points for moderate visuospatial disorganization of numbers such that accurate denotation of 10 min to 2 h is impossible, one point for a severe level of visuospatial disorganization, and zero point for inability to make any reasonable representation of a clock. 11 The cut-off score for considering a drawing as abnormal according to the Pfizer method was any score different from 4, 12 while the corresponding score in the Shulman system was any score below 4. 13

Two medicine students underwent previous training performed by an experienced neurologist. The training consisted of an explanation of both methods for scoring CDT, followed by the evaluation of a series of CDT pictures obtained from elderly patients with cognitive decline. The training was considered completed when the students were able to score five consecutive CDT pictures adequately according to both methods.

All participants had their drawings photographed, and a code number was attributed to each one for avoiding any breach of confidentiality. The raters independently evaluated all the 404 drawings and scored them according to the Pfizer system and after a week, to the Shulman method. Each rater registered the results in different Excel sheets so that a rater was not aware of the score attributed by the other rater and not even of his own previous (Pfizer) assessment. A third researcher was responsible for data collection into the statistic program, and a fourth one performed the statistical analysis comparing the concordance between the two methods regarding subjects’ cognitive status and the inter-observer correlation.

To assess intra-observer correlation, 52 drawings were randomly reevaluated according to the Pfizer system and the other 52 drawings taking into account the Shulman method, one week apart from the original assessment. Raters were blinded and not aware of their previous scores. As aforementioned, a third researcher collected data and another one performed the statistical analysis.

Quantitative variables were presented as mean and standard deviation or when appropriate, median and 25–75% interquartile range (IQR). This was employed for calculating the inter-observer and intra-observer correlations with kappa statistics (<0.2: poor; 0.21–0.4: weak; 0.41–0.6: moderate; 0.61–0.9: strong; 0.91–1: very strong), through Spearman’s rank correlation coefficient with ordinal scores. 14 Categorical data were described as a percentage and absolute frequency, including the dichotomized variable normal/abnormal result in both scoring methods, which were compared with the chi-square test. The analyses were performed with commercially available Statistical Package for the Social Sciences (SPSS) version 16.0 (SPSS Inc., Chicago, IL, USA). Statistical significance was assessed with a two-tailed p-value<0.05.

RESULTS

Demographic and clinical characteristics of the sample are depicted in Table 1. Most of the subjects were Caucasian women.

Table 1. Demographic characteristics of the sample (n=404). Qualitative variables are presented as the absolute number and percentage, while quantitative data are exhibited as mean±standard deviation or when otherwise stated, median and interquartile range.

Characteristics Results
Frequency (percentage) Mean (±SD) Median (IQR)
Gender
Female 379 (93.8%)
Male 25 (6.2%)
Declared ethnicity
Caucasian 330 (84.6%)
Afro-Brazilian Amerindian 52 (13.3%) 8 (2.1%)
Age (years) 66.9 (7.9)
Scholarity (years) 10.9 (5.6)
Pfizer score 3.5 (3.0–4.0)
Shulman score 5.0 (4.0–5.0)

SD: standard deviation; IRQ: interquartile range.

Since the ordinal variables resulted from the scores in the Pfizer and Shulman systems, the nonparametric Spearman’s rank correlation coefficient was employed. There was a strong inter-observer correlation for both methods (Table 2).

Table 2. Inter-observer correlation (n=404).

Correlation r* 95%CI** Interpretation of correlation
Pfizer test 0.739 0.676–0.796 Strong
Shulman test 0.727 0.667–0.786 Strong
*

Spearman’s rank correlation coefficient;

**

95%CI: 95% confidence interval.

A subset of 52 random drawings was randomly reevaluated with the Pfizer system and the other 52 random drawings through the Shulman method. This was performed by both raters, rendering a strong intra-observer correlation, with the exception of the Pfizer system for rater 1, where a moderate correlation was observed (Table 3). There was a strong intra-observer correlation as well when comparing the scores of the Pfizer system with those of the Shulman method, for each rater (n=404; Table 3).

Table 3. Intra-observer correlation.

Correlation r* 95%CI** Interpretation of correlation
Pfizer
Rater 1 (n=52) 0.628 0.434–0.776 Moderate
Rater 2 (n=404) 0.702 0.639–0.755 Strong
Shulman
Rater 1 (n=52) 0.843 0.701–0.946 Strong
Rater 2 (n=52) 0.821 0.704–0.900 Strong
Pfizer vs. Shulman
Rater 1 (n=404) 0.744 0.681–0.795 Strong
Rater 2 (n=52) 0.819 0.617–0.968 Strong
*

Spearman’s rank correlation coefficient with ordinal scores in tests;

**

95%CI: 95% confidence interval.

A weak correlation arose between years of scholarity and the scores in the Pfizer and Shulman methods (Table 4). In contrast, a very weak negative correlation appeared between the scores and age (Table 4).

Table 4. Correlation of Pfizer and Shulman scores with years of scholarity and age (n=404).

Correlation r* 95%CI** Interpretation of correlation
Pfizer
Scholarity 0.283 0.189–0.378 Weak
Age -0.170 -0.267 to −0.073 Very weak
Shulman
Scholarity 0.244 0.146–0.338 Weak
Age -0.174 -0.270 to -0.081 Very weak
*

Spearman’s rank correlation coefficient with ordinal scores in tests (median of raters);

**

95%CI: 95% confidence interval.

More participants were classified with decreased cognition through the Pfizer method (rater 1: 44.3%; rater 2: 42.1%) than through the Shulman system (rater 1: 26.5%; rater 2: 16.3%) (p≤0.001; n=404).

DISCUSSION

There are several methods of scoring CDT. Shulman, one of the pioneers of its use, advocates the simpler the scoring system the better because the more complicated and lengthy methods did not appear to add significant value to the psychometric properties or clinical utility of this test. 11,12 However, even the 5-item score Shulman system was considered to be somewhat difficult. 6 An easier scoring system based on four items was proposed by Borson et al. 15 and adopted by the Consortium to Establish a Registry for Alzheimer Disease (CERAD) 16 and by the Pfizer Inc. and Eisai Inc., 8 a fact that popularized the test with the name Pfizer method in some circles.

The main purpose was to compare the Shulman and the Pfizer systems in a large series of adult and elderly people to assess the presence of any difference in terms of inter- and intra-observer ratings. The higher the correlations between the scores, the more significant the results. The inter-observer and intra-observer correlations demonstrated that the Shulman and the Pfizer systems are consistently similar when applied by trained raters. This suggests that the two methods could be applied in primary care, rendering similar results in terms of rating properties. In this setting, the choice of an easier scoring system would be more suitable for cognitive assessment through CDT in primary attention. That is the case with the Pfizer system.

A secondary objective was to compare the interpretation of the results of each method. In the non-selected sample, more participants were classified with decreased cognition with the aid of the Pfizer system than through the Shulman method. Although we did not examine such results in the face of a gold standard diagnostic test, a fact that should be considered a limitation of our study, such finding raises the possibility that the Pfizer method could be more sensitive for screening purposes than the Shulman system. A recent systematic review and meta-analysis about the diagnostic accuracy of CDT proposed its widespread use in the diagnosis of dementia, citing the Shulman method as the most studied, but with some concern about the understanding of its scoring system. 6 CDT seems to be a robust screening test for Alzheimer’s disease when compared with Mini-Mental State Examination (MMSE). 17 In contrast, previous studies have already stated that the CDT is not a good screening tool for the diagnosis of mild cognitive impairment. 7,18 An advisable approach to improve the diagnostic performance of the CDT is the combination with other tests, especially MMSE. 17,19,20

The scholarity seems to influence the performance in CDT. Small cross-sectional studies suggested that CDT is not sensitive enough to detect very mild dementia neither in educated people 7 nor in those with limited education. 5 Illiterate patients can be successfully screened for Alzheimer’s disease using well-known screening instruments in combined protocols (e.g., CDT and MMSE). 21 A study that analyzed the qualitative elements of CDT, rather than the sole score, showed that among participants without cognitive impairment, those with lower education often presented graphic difficulties, conceptual deficits, and spatial deficits. 10 Our results confirmed that more educated people exhibit slightly better performance in CDT.

Finally, there is an issue about the age and the performance in CDT. A previous study with 180 adults (47–82 years) found no influence of age in CDT scores. 10 However, in our sample of 404 adults (50–89 years), a weak negative correlation appeared between the scores and age, that is, the older the subjects, the worse the performance. This finding aligns with a large Brazilian study that also showed a negative correlation between age and cognitive performance, although not employing CDT. 22

We must recognize the profile of our sample as a limitation since most participants were Caucasian women living in the urban area. This restrains the generalization of our results for other populational groups, namely, men, non-Caucasian, and people living in the countryside. Another concern is that the Elo-Creati cohort is representative of adult and elderly people engaged in occupational activities related to healthy aging, far from a sample of patients with declared cognitive decline or dementia. Thus, our population seems of a primary care subset, instead of secondary or tertiary care. Our conclusions may not apply to groups composed solely of patients with cognitive problems.

In summary, our results suggest that the simplest form of scoring CDT called the Pfizer system can be employed in primary care for evaluating cognitive decline, with an easier interpretation than the Shulman method. The combination of CDT with other cognitive tests, especially MMSE, is advisable for a thorough evaluation, taking into account educational level and age as influencing factors.

Footnotes

This study was conducted by the Elo-Creati Cohort Group and Faculdade de Medicina, Universidade de Passo Fundo, Passo Fundo, RS, Brazil.

Funding: none.

REFERENCES

  • 1.Livingston G, Huntley J, Sommerlad A, Ames D, Ballard C, Banerjee S, et al. Dementia prevention, intervention, and care. Lancet. 2020;396(10248):413–46. doi: 10.1016/S0140-6736(20)30367-6. 2020 report of the Lancet Commission. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Prince M, Bryce R, Albanese E, Wimo A, Ribeiro W, Ferri CP. The global prevalence of dementia: a systematic review and metaanalysis. Alzheimers Dement. 2013;9(1):63–75. doi: 10.1016/j.jalz.2012.11.007. [DOI] [PubMed] [Google Scholar]
  • 3.Langa KM, Levine DA. The diagnosis and management of mild cognitive impairment: a clinical review. JAMA. 2014;312(23):2551–61. doi: 10.1001/jama.2014.13806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Paddick SM, Gray WK, McGuire J, Richardson J, Dotchin C, Walker RW. Cognitive screening tools for identification of dementia in illiterate and low-educated older adults, a systematic review and meta-analysis. Int Psychogeriatr. 2017;29(6):897–929. doi: 10.1017/S1041610216001976. [DOI] [PubMed] [Google Scholar]
  • 5.Scarabelot LF, Monteiro MM, Rubert MCS, Zetola VHF. Is Mini-Mental State Examination the best cognitive screening test for less educated people? Arq Neuro-Psiquiatr. 2019;77(5):330–4. doi: 10.1590/0004-282X20190043. [DOI] [PubMed] [Google Scholar]
  • 6.Park JK, Jeong EH, Seomun GA. The clock drawing test: A systematic review and meta-analysis of diagnostic accuracy. J Adv Nurs. 2018;74(12):2742–54. doi: 10.1111/jan.13810. [DOI] [PubMed] [Google Scholar]
  • 7.Powlishta KK, Von Dras DD, Stanford A, Carr DB, Tsering C, Miller JP, et al. The clock drawing test is a poor screen for very mild dementia. Neurology. 2002;59(6):898–903. doi: 10.1212/wnl.59.6.898. [DOI] [PubMed] [Google Scholar]
  • 8.Wang P, Shi L, Zhao Q, Hong Z, Guo Q. Longitudinal changes in Clock Drawing Test (CDT) performance before and after cognitive decline. PLoS One. 2014;9(5):e97873. doi: 10.1371/journal.pone.0097873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Allone C, Lo Buono V, Corallo F, Bonanno L, Palmeri R, Di Lorenzo G, et al. Cognitive impairment in Parkinson’s disease, Alzheimer’s disease, and vascular dementia: the role of the clock-drawing test. Psychogeriatrics. 2018;18(2):123–31. doi: 10.1111/psyg.12294. [DOI] [PubMed] [Google Scholar]
  • 10.Fabricio AT, Aprahamian I, Yassuda MS. Qualitative analysis of the Clock Drawing Test by educational level and cognitive profile. Arq Neuropsiquiatr. 2014;72(4):289–95. doi: 10.1590/0004-282x20140004. [DOI] [PubMed] [Google Scholar]
  • 11.Shulman KI. Clock-drawing: is it the ideal cognitive screening test? Int J Geriat Psychiatry. 2000;15(6):548–61. doi: 10.1002/1099-1166(200006)15:6&#x0003c;548::aid-gps242&#x0003e;3.0.co;2-u. [DOI] [PubMed] [Google Scholar]
  • 12.Mainland BJ, Amodeo S, Shulman KI. Multiple clock drawing scoring systems: simpler is better. Int J Geriatr Psychiatry. 2014;29(2):127–36. doi: 10.1002/gps.3992. [DOI] [PubMed] [Google Scholar]
  • 13.Shulman KI, Shedletsky R, Silver IL. The challenge of time: clock drawing and cognitive function in the elderly. Int J Geriatr Psychiatry. 1986;1(2):135–40. [Google Scholar]
  • 14.Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33(3):613–9. doi: 10.1177/001316447303300309. [DOI] [Google Scholar]
  • 15.Borson S, Brush M, Gil E, Scanlan J, Vitalino P, Chen J, et al. The Clock Drawing Test: utility for for dementia detection in multiethnic elders. J Geront Med Sci. 1999;54A:M534–M540. doi: 10.1093/gerona/54.11.m534. [DOI] [PubMed] [Google Scholar]
  • 16.Pinto E, Peters R. Literature review of the Clock Drawing Test as a tool for cognitive screening. Dement Geriatr Cogn Disord. 2009;27:201–13. doi: 10.1159/000203344. [DOI] [PubMed] [Google Scholar]
  • 17.Aprahamian I, Martinelli JE, Neri AL, Yassuda MS. The accuracy of Clock Drawing Test compared to that of standard screening tests for Alzheimer’s disease: results from a study of Brazilian elderly with heterogeneous educational backgrouds. Int Psychogeriatr. 2010;22(1):64–71. doi: 10.1017/S1041610209991141. [DOI] [PubMed] [Google Scholar]
  • 18.Ehreke L, Luppa M, König HH, Riedel-Heller SG. Is the clock drawing test a screening tool for the diagnosis of mild cognitive impariment? A systematic review. Int Psychogeriatr. 2010;22(1):56–63. doi: 10.1017/S1041610209990676. [DOI] [PubMed] [Google Scholar]
  • 19.Brodaty H, Moore CM. The clock drawing test for dementia of the Alzheimer’s type: a comparison of three scoring methods in a memory disorders clinic. Int J Geriatr Psychiatry. 1997;12(6):619–27. [PubMed] [Google Scholar]
  • 20.Olazarán J, Hoyos-Alonso MC, del Ser T, Garrido Barral A, Conde-Sala JL, Bermejo-Pareja F, et al. Practical application of brief cognitive tests. Neurologia. 2016;31(3):183–94. doi: 10.1016/j.nrl.2015.07.009. [DOI] [PubMed] [Google Scholar]
  • 21.Aprahamian I, Martinelli JE, Cecato J, Yassuda MS. Screening for Alzheimer’s disease among illiterate elderly: accuracy analysis for multiple instruments. J Alzheimers Dis. 2011;26(2):221–9. doi: 10.3233/JAD-2011-110125. [DOI] [PubMed] [Google Scholar]
  • 22.Castro-Costa E, Lima-Costa MF, Andrade FB, Souza PRB, Junior, Ferri CP. Cognitive function among older adults: the ELSI-Brazil results. Rev Saude Publica. 2018;52 Suppl 2:4s. doi: 10.11606/S1518-8787.2018052000629. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Dementia & Neuropsychologia are provided here courtesy of Academia Brasileira de Neurologia

RESOURCES