Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Dec 1.
Published in final edited form as: Schizophr Res. 2017 Mar 9;190:74–76. doi: 10.1016/j.schres.2017.03.017

Consistency checks to improve measurement with the Positive and Negative Syndrome Scale (PANSS)

Jonathan Rabinowitz a,*, Nina R Schooler b, Ariana Anderson c, Lindsay Ayearst d, David Daniel e, Michael Davidson f, Anzalee Khan g, Bruce Kinon h, Francois Menard i, Lewis Opler j, Mark Opler k, Joanne B Severe l, David Williamson m, Christian Yavorsky n, Jun Zhao o; ISCTM ALGORITHMS/FLAGS TO IDENTIFY CLINICAL INCONSISTENCY IN THE USE OF RATING SCALES IN CNS RCTs working group members
PMCID: PMC5662474  NIHMSID: NIHMS913153  PMID: 28285023

Abstract

International Society for CNS Clinical Trials and Methodology convened an expert working-group that assembled consistency/inconsistency flags for the Positive and Negative Syndrome Scale (PANSS). Twenty-four flags were identified and divided based on extent to which they represent error (Possibly, Probably, Very probably or definitely). The flags were applied to assessments derived from the NEWMEDS data repository and the CATIE clinical trial data. Almost 40% of ratings had at least one inconsistency flag raised and 10% had two. Application of flags to clinical rating can improve reliability and validity of trials.


Symptom manifestations and changes in psychiatric illnesses can be subtle. Even small imprecision in measurement can lead to over- or under-estimation of change. One strategy to improve measurement is to conduct logical consistency checks between item responses (i.e., cross-sectionally) and test administrations (i.e., longitudinally) of a rating scale, bearing in mind that some degree of inconsistency between items within and across administrations is to be expected due to subject-based variability in item interpretations, attention, and other factors.

Consistency checks can be used to flag instances of possible error, regardless of source. The same process may also be used to identify raters who might benefit from additional training, in keeping with best practice approaches for completing standardized assessments, as well as to provide ongoing feedback to raters, and assess overall quality of assessments or to identify patients with atypical or anomalous symptom presentations. The Positive and Negative Syndrome Scale (PANSS) is used widely in schizophrenia clinical trials (Kay et al., 1987). Administration of the scale requires attention to the details of scoring rules and criteria as well as the complexity of the disorder itself. Enhanced precision in administering the scale could increase the validity of these trials.

1. Methods

ISCTM (International Society for CNS Clinical Trials and Methodology) convened a working group of PANSS experts from academia and industry, including one of the scale authors, who assembled, over the last two years, potential consistency indicators or “flags” for the PANSS (complete list of working group members included in Appendix A). The general strategy was to define overly consistent scoring patterns, irregular scoring patterns, and incompatibilities in scoring among items within the scale and between administrations. Much of the process was guided by the explicit language in the anchor points and definition/basis for rating in the PANSS– e.g., G4 Tension cannot be rated if the patient has not endorsed anxiety. After defining a flag, the group ranked the probability that it represented error (Possibly, Probably, Very probably or Definitely).

To examine the incidence of such flags, they were applied to (a) the NEWMEDS database, which is composed of 121,635 assessments of 19,489 subjects from placebo controlled trials of second- generation antipsychotics (for a description see (Rabinowitz et al., 2014)) and (b) the 8849 assessments of 1450 subjects in CATIE (For a description see Lieberman et al. (2005)).

2. Results

Twenty-four flags were identified and divided based on extent to which they represent error (Possibly, Probably, Very probably (or definitely)): within PANSS items or across repeated administrations.

Table 1 presents the flags, ratings by the expert group and the percentage of assessments in NEWMEDS and CATIE identified by each flag. Almost 40% of NEWMEDS and CATIE ratings had at least one flag raised, and approximately 10% of the ratings had two flags raised. The most common inconsistencies in order of extent to which they were judged to represent error were as follows: Same response on all items from previous visit, depression (G6) moderate severe or greater and motor retardation (G7) at least mild; lack of spontaneity 2 points greater than poor rapport (N3); if either hostility (P7) or poor impulse control (G14) is greater than mild then the other should not differ by more than 2 points; tension (G4) should not be greater than anxiety (G2); if hallucinatory behavior (P3) moderate severe or greater then preoccupation (G15) should be at least moderate severe; if conceptual disorganization (P2) is moderate severe or greater then difficulty in abstract thinking (N5) should be at least moderate severe.

Table 1.

Flags of possible Clinical Inconsistency in the Positive and Negative Syndrome Scale PANSS.

% NEWMEDS assessments (n = 121,635) %of CATIE assessments (n = 8849)
High flag -very probably (or definitely) error
Proportion of ratings with at least 1 “high” flag 14.9% CATIE assessments were 1 to 3 months
Individual “high” flags apart so NA
1. Same response on all 30 items from previous visit* 5.6%
2. Same response on 29 items from previous visit* 8.0%
3. Same response on 28 items from previous visit* 10.9%
4. Same response on 27 items from previous visit* 14.3%
5. Change from 1 to 7 on an item from previous visit* 0.1%
6. Change from 7 to 1 on an item from previous visit* 0.1%
7. Change of more than 40 on total score from previous visit* 0.2%
8. Change of 50% or more on total score from previous visit* (e.g., (85–40)/80) 0.4%
9. P5 grandiosity 5, 6 or 7 & P1 delusions less than 3 0.05% 0.1%
10. P6 suspiciousness 6 or 7 & P1 delusions less than 3 0.1% 0.01%
11. G1 somatic concerns 6 or 7 & P1 delusions less than 3 0.1% 0.1%
12. G3 guilt feelings 6 or 7 & P1 delusions less than 3 0.04% 0.1%
13. G9 unusual thought 5 or more & P1 delusions less than 3 0.3% 0.1%
Medium flag -probably an error
Proportion of ratings with at least 1 “medium” flag 19.6% 17.7%
14. G4 tension is greater than G2 anxiety 17.2% 14.1%
15. G6 depression 5 or greater and G7 motor retardation less than 3 1.9% 4.0%
16. G7 motor retardation 6 or greater & N6 lack of spontaneity less than 5 0.1% 0.1%
17. N4 passive social withdrawal & G16 active social avoidance both 7 0.2% 0.02%
18. G7 motor retardation 5 or greater & P4 excitement 4 or more 0.6% 0.1%
19. Among P5, P6, G1and G3 – more than 1 is 7 0.1% 0.02%
Low flag -possibly an error
20. N6 lack of spontaneity is 2 pts greater than N3 poor rapport 1.9% 1.9%
21. Difference of more than 2 points between G8 uncooperativeness and P7 hostility 2.1% 1.8%
22. P7 hostility, G8 uncooperativeness and/or G14 poor impulse control with a score of 4 or greater & at least one of the others with a score 2 points greater or less than that (e.g., P7 = 6 & G14 = 3 or P7 = 4 & G8 = 7) 5.5% 3.9%
23. P3 hallucinatory behavior 5 or greater & G15 preoccupation less than 5 11.2% 14.7%
24. P2 conceptual disorganization 5 or greater & N5 difficulty in abstract thinking is less than 5 4.9% 4.0%

NEWMEDS data on placebo and active controlled trials: number of patients = 19,849; number of assessments = 121,635; CATIE data, number of patients= 1450; number of assessments = 8849.

*

Previous visit within 1 month.

3. Discussion

The potential errors identified by flags do occur and the frequency is remarkably consistent between the NEWMEDS and CATIE data sets despite the differences between the two in terms of frequency of assessment. The most common flag in both data sets is a rating of tension in the absence of anxiety. This violates a PANSS rating rule. Almost as frequent in both data sets is lack of consistency between high levels of hallucinatory behavior and preoccupation. All other flags occur 5% or less of the time.

There are limitations to this approach. The first is that nearly all of these data patterns could happen in a real subject reporting legitimate experience. However, based on the NEWMEDS and CATIE data, each one is unlikely to happen (most are exceedingly unlikely to happen). Therefore, in general, if more than one of the flags is present, it is likely that the assessment should be questioned. Certain rarely occurring flags specifically those that identify components of delusions may signal questionable assessments even though no other flags are present.

The second is that, by providing raters with feedback, they may be inclined to change ratings that are correct or to review their ratings to make sure that they are in keeping with the flags, rather than to rate based on what they observe. As noted by Andreasen (2007) and others, there is some danger in reducing any psychiatric activity to a checklist-driven process, and “phenomenology” and careful observation should not be abandoned in favor of a formulaic approach to assessment. We are aware of the concern that raters might use this work to “get the answers” correct. However, we note that “perfect” can also be identified. This works highlights the need for rater training to help improve consistency and accuracy of ratings.

4. Conclusions

Analyzing items within the PANSS, can help to identify potential signals of inconsistent ratings, which may be remediated with targeted rater training to raters showing a high level of flags, could be used to develop a metric for rater selection and could also be a strategy for conducting sensitivity analyses to examine effects of questionable ratings in a trial. They may additionally help to identify other sources of error or anomaly, including those that derive from the patient rather than the rater. And finally they may be used in review of individual trials and could have a role in decisions about inclusion of trials in meta-analyses. The flags could also be used to suggest possible improvements for the PANSS and might be useful to identify inconsistent responses of subjects that might help to identify professional subjects.

Acknowledgments

The research leading to these results has received support from the Innovative Medicine Initiative Joint Undertaking under grant agreement no. 115008 of which resources are composed of European Federation of Pharmaceutical Industries and Associations (EFPIA) in-kind contribution and financial contribution from the European Union's Seventh Framework Programme (FP7/2007–2013). Funding source was not involved in the collection, analysis, and interpretation of data; in the writing of the report; and nor in the decision to submit the paper for publication.

Appendix A

List of ISCTM working group members (2014–2015)

Last Name First Name
Anderson Ariana
Ayearst Lindsay
Berkowitz Linda
Bertzos Kristina
Binneman Brendon
Busner Joan
Daniel David
Davidson Michael
Davis Lori
Davis Vicki
de Swart Hans
Dunayevich Eduardo
Ellis Amy
Eriksson Hans
Farfel Gail
Hughes Christina
Inamdar Amir
Jovic Sofija
Kane John
Khan Anzalee
Kott Alan
Menard François
Morrison Randy
Murphy Christopher
Nations Kari
Opler Mark
Opler Lewis
Owen Randall
Phillips Glenn
Rabinowitz Jonathan
Raedler Thomas
Rault Hélène
Risinger Robert
Rosenthal Adena
Sachdeva Parsh
Sajatovic Martha
Schooler Nina
Severe Joanne
Spiridonescu Laura
Stewart Michelle
Vardy Julianna
Vornov James
Williamson David
Yavorsky William
Zhao Jun

Footnotes

First two authors co-chaired the working group. Other authors are listed in alphabetical order.

Contributors

Author Rabinowitz initiated the idea, and together with Schooler, created the working group that developed the consistency flags reported in the paper. Authors Anderson, Ayearst, Daniel, Davidson, Khan, Kinon, Menard, L. Opler, M. Opler, Severe, Williamson, Yavorsky and Zhao contributed flags and/or critical comments that helped in meaningful and substantial ways to this work. Rabinowitz analyzed the data and drafted the manuscript which was then reviewed by Schooler and then by all of the other authors who approved the final manuscript.

Conflict of interest

Jonathan Rabinowitz has received research grant(s) support and/or travel support and/or speaker fees and/or consultant fees from Janssen (J&J), Eli Lilly, Pfizer, BiolineRx, Roche, Abraham Pharmaceuticals, Pierre Fabre, Intra-cellular Therapies, Minerva and Amgen.

Nina Schooler has received honoraria and travel support for attendance at advisory board meetings or consultation from Allergan, Alkermes, Forum, Roche and Sunovion.

Ariana Anderson has received grants or contracts from Janssen Research and Development and BlackThorn Therapeutics.

Lindsay Ayearst is employed by Multi-Health Systems Inc. (MHS), the publisher of the PANSS. She does not directly benefit from sales of the PANSS either through royalty payments, sales commissions, or any other direct means.

David Daniel is a Full time Employee of Bracket Global, LLC.

Michael Davidson has received research grant(s) support and/or travel support and/or speaker fees and/or consultant fees from Eli Lilly, Servier, and Minerva and holds stocks in Minerva and Tangent Research.

Anzalee Khan has no conflicts of interests to report.

Bruce Kinon is an employee and shareholder of H Lundbeck and Shareholder and former employee of Eli Lilly and Company.

Francois Menard is a former employee of H Lundbeck.

Mark Opler is an employee of ProPhase LLC, receives royalties from MHS Inc. on the sale of the PANSS Manual and has grant funding from Stanley Foundation, QNRF, and NIH.

Lewis Opler has no conflicts of interests to disclose.

Joanne Severe has no conflicts of interests to disclose.

David Williamson is an employee of Janssen Scientific Affairs, LLC and a stockholder of Johnson & Johnson.

Christian Yavorsky is a principal at CRONOS CCS, a company that provides risk-based data monitoring that utilizes algorithms like those described in this paper to identify sources of potential error in clinical trials.

Jun Zhao has no conflicts of interests to disclose.

References

  1. Andreasen NC. DSM and the death of phenomenology in america: an example of unintended consequences. Schizophr. Bull. 2007;33(1):108–112. doi: 10.1093/schbul/sbl054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr. Bull. 1987;13(2):261–276. doi: 10.1093/schbul/13.2.261. [DOI] [PubMed] [Google Scholar]
  3. Lieberman JA, Stroup TS, McEvoy JP, Swartz MS, Rosenheck RA, Perkins DO, Keefe RS, Davis SM, Davis CE, Lebowitz BD, Severe J, Hsiao JK. Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. N. Engl. J. Med. 2005;353(12):1209–1223. doi: 10.1056/NEJMoa051688. [DOI] [PubMed] [Google Scholar]
  4. Rabinowitz J, Werbeloff N, Caers I, Mandel FS, Stauffer V, Menard F, Kinon BJ, Kapur S. Determinants of antipsychotic response in schizophrenia: implications for practice and future clinical trials. J. Clin. Psychiatry. 2014;75(4):e308–e316. doi: 10.4088/JCP.13m08853. [DOI] [PubMed] [Google Scholar]

RESOURCES