Multi-institutional study of pathologist reading of the PD-L1 combined positive score (CPS) immunohistochemistry assay in gastric or gastroesophageal junction cancer

Aileen I Fernandez; Charles J Robbins; Patricia Gaule; Diana Agostini-Vulaj; Robert A Anders; Andrew Bellizi; Wei Chen; Zongming Eric Chen; Purva Gopal; Lei Zhao; Mikhail Lisovsky; Xiuli Liu; Jinru Shia; Huamin Wang; Zhaohai Yang; Leena McCann; Yvonne G Chan; Jodi Weidler; Michael Bates; Xuchen Zhang; David L Rimm

doi:10.1016/j.modpat.2023.100128

. Author manuscript; available in PMC: 2024 May 1.

Published in final edited form as: Mod Pathol. 2023 Feb 13;36(5):100128. doi: 10.1016/j.modpat.2023.100128

Multi-institutional study of pathologist reading of the PD-L1 combined positive score (CPS) immunohistochemistry assay in gastric or gastroesophageal junction cancer

Aileen I Fernandez ¹, Charles J Robbins ¹, Patricia Gaule ¹, Diana Agostini-Vulaj ², Robert A Anders ³, Andrew Bellizi ⁴, Wei Chen ⁵, Zongming Eric Chen ⁶, Purva Gopal ⁷, Lei Zhao ⁸, Mikhail Lisovsky ⁹, Xiuli Liu ¹⁰, Jinru Shia ¹¹, Huamin Wang ¹², Zhaohai Yang ¹³, Leena McCann ¹⁴, Yvonne G Chan ¹⁴, Jodi Weidler ¹⁵, Michael Bates ¹⁵, Xuchen Zhang ¹, David L Rimm ¹⁶

¹Department of Pathology, Yale University School of Medicine, New Haven, Connecticut.

²Department of Pathology and Laboratory Medicine, University of Rochester Medical Center, Rochester, New York.

³Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland.

⁴Department of Pathology, University of Iowa, Iowa City, Iowa.

⁵Department of Pathology, The Ohio State University Wexner Medical Center, Columbus, Ohio.

⁶Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota.

⁷Department of Pathology, UT Southwestern Medical Center, Dallas, Texas.

⁸Department of Pathology, Brigham and Women’s Hospital, Boston, Massachusetts.

⁹Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire.

¹⁰Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, Missouri.

¹¹Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York.

¹²Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas.

¹³Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania.

¹⁴Oncology Research and Development, Cepheid, Sunnyvale, California.

¹⁵Medical and Scientific Affairs and Strategy, Oncology, Cepheid, Sunnyvale, California.

¹⁶Department of Pathology, Yale University School of Medicine, New Haven, Connecticut; Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, Connecticut.

Author Contributions

A.I.F., X.Z., D.L.R., and P.G. performed study concept and design; A.I.F., P.G., J.W., L.M., Y.C., M.B., X.Z., and D.L.R., performed development of methodology; A.I.F., X.Z., and D.L.R. performed writing, review and revision of the paper; A.I.F., P.G., D.A-V., R.A., A.B., W.C., Z.E.C., P.G., Z.L., M.L., X.L., J.S., H.W., Z.Y., V.Y., M.K.M, T.N.A., and N.G performed data acquisition; A.F. and C.J.R provided analysis and interpretation of data, and statistical analysis; J.W., L.M., Y.C., and M.B. provided technical and material support. All authors read and approved the final paper.

^✉

Corresponding Author: David L. Rimm, M.D.- Ph.D., Professor of Pathology, Department of Pathology, BML 116, Yale University School of Medicine, 310 Cedar Street P.O. Box 208023, New Haven, CT 06520-8023. Phone: 203-737-4204., david.rimm@yale.edu

PMCID: PMC10198879 NIHMSID: NIHMS1881325 PMID: 36889057

Abstract

Assessment of expression of PD-L1 by immunohistochemistry (IHC) has been controversial since its introduction. The methods of assessment and the range of assays and platforms all contribute to confusion. Perhaps the most challenging aspect of PD-L1 IHC is the combined positive score (CPS) method of interpretation of IHC. While the CPS method is prescribed for more indications than any other PD-L1 scoring system, its reproducibility has never been rigorously assessed. In this study, we collected a series of 108 gastric or gastroesophageal junction cancer cases, stained them with the FDA-approved 22C3 assay, scanned and then circulated them to 14 pathologists at 13 institutions to assess interpretative concordance for the CPS system. We find that higher cut-points (10 or 20) perform better than CPS<1 vs >1. We use the ONEST algorithms to assess how the CPS system might perform in the real-world setting and find that the <1/>1 cut point shows only 30% overall percent agreement amongst pathologist raters with a plateau occurring at 8 raters. The raters do better at higher cut-points. However, the best cut-point of <20 vs >20 is still disappointing with a plateau at overall percent agreement at 70% (at 7 raters). While there is no ground truth for CPS, we compare the score to quantitative mRNA measurement and show no relationship between score (at any cut-point) and mRNA amount. In summary, we show that CPS shows high subjective variability among pathologist readers and is likely to perform poorly in real world usage. This system may be the root cause of the poor specificity and relatively low predictive value of IHC companion diagnostic tests for PD-1 axis therapies that use the CPS system.

Introduction

With the increased use of immune checkpoint inhibitors (ICI) in solid tumors, programmed cell death ligand-1 (PD-L1) immunohistochemistry (IHC) assays have remained at the forefront as the biomarkers of indicators of response ¹. While initially the expression of PD-L1 on the tumors was thought to be the main ICI target, as data evolved it was found that “immune cells” (defined as lymphocytes, dendritic cells, macrophages and other cells of the host immune system) were also critical in the mechanism of action of the ICI drugs ^2,3. To accommodate this finding, methods of scoring the PD-L1 IHC switched from initially just scoring Tumor or Immune cell expression in separate categories to a single combined score, the combined positive score (CPS).

To date there are four Food and Drug Administration (FDA) approved PD-L1 assays for solid tumors but the scoring systems are tumor specific ⁴. While lung cancer maintains separate tumor cell and immune cell scoring, nearly all other tumor types have migrated to the CPS which measures PD-L1 expression on both tumor cells and mononuclear inflammatory cells in tissue. Currently, CPS is the approved companion diagnostic test in gastric cancer ⁵, as well as in other solid tumors, including cervical, urothelial, esophageal squamous cell carcinoma, head and neck squamous cell carcinoma, and triple-negative breast cancer ⁶.

CPS is calculated by counting all cells with PD-L1 staining in the tumor area. This includes membranous staining in tumor cells and membranous and/or cytoplasmic staining in lymphocytes or macrophages and dividing this value by the total number of viable tumor cells and multiplying by 100 ⁷. Only the expression levels on lymphocytes and macrophages are included; other PD-L1 expressing immune cell types in the tumor bed should be disregarded. Macrophages located within the tumor stroma or intercalating tumor are particularly difficult to count and can lead to inaccuracy but are included in the cells to be counted. Some cytoplasmic or nuclear parts of these cells, or even the whole nucleus, may be out of the plane of the 5-micron histologic section. Thus, using the nucleus to define a cell is not practical. This issue and other problems make the CPS score highly variable amongst pathologist observers ⁸, in spite of efforts to train pathologists in methods of assessment ⁷, Thus the “counting method” used in the clinical trials is both challenging and time consuming. In a recent CAP PD-L1 quality assessment survey (CAP 2021B) response to survey questions suggest that less than 3% of pathologists actually try to count each cell and compute a score⁹.

The approved FDA companion diagnostic test for the use of anti-PD-1 ICI, pembrolizumab, in gastric cancer is Dako PD-L1 IHC 22C3 PharmDx, with a CPS ≥ 1 as the cut off for positivity ⁴. This requires the counting or estimation of stained cells, as described above. Given the increased success of ICIs in gastric cancer and expansion into many other tumor types, and the variability of PD-L1 CPS scoring on tissues, we believe it is important to assess the performance of CPS in the real-world setting. Our goal was to assess concordance in PD-L1 22C3 scoring among pathologists from multiple institutions. Since most of the pathologist participants are academics, it could be argued this is the “ivory tower” world, not the real-world. However, we aim to analyze assessment of the CPS assay amongst multiple pathologist readers from a range of institutions in the USA. We also compared the subjective CPS measure to quantitative mRNA.

Materials and Methods

Cohort information

We retrospectively collected 112 non-serial biopsies from patients seen at Yale School of Medicine from 2010 to 2020. Inclusion criteria included a diagnosis with gastroesophageal junction or gastric cancer cases, and available formalin-fixed paraffin embedded (FFPE) tissue from the archives of the Department of Pathology. No treatment information or clinical annotations were collected. Available archival hematoxylin and eosin (H&E) slides, paraffin-embedded tissues, and FFPE slides were collected from each case and subsequently reviewed by a board-certified pathologist (XZ) to ensure tissue viability. Written informed consent or waiver of consent was provided by all the patients. This study was approved by Yale Human Investigation IRB protocol ID #9505008219.

Slide preparation and distribution

The collected FFPE slides were designated for hematoxylin and eosin (H&E), staining with Dako PD-L1 IHC 22C3 PharmDx, or closed-system real-time polymerase chain reaction (qRT-PCR).

H&E was performed by the Yale Pathology Tissue Services core facility. The Dako PD-L1 IHC 22C3 pharmDx assay was performed exactly according to manufacturer instructions on serial sections ¹⁰. Briefly, FFPE slides were deparaffinized and rehydrated then underwent antigen retrieval in PT link using Envision Flex low pH target retrieval solution. After a wash in Envision Flex Wash buffer, slides were transferred to Link48 and underwent chromogenic staining with 22C3. Slides were reviewed by two board-certified pathologists (XZ and DLR) to ensure quality control (intact tissue and at least 100 tumor cells). Of the 112 cases, 108 passed all quality control parameters. Both H&E and PD-L1 22C3 stained slides were scanned on the Aperio ScanScope XT platform using the 20x setting.

Slide distribution and pathologist information

Digital files of the scanned H&E and 22C3 images were distributed to 14 selected peer pathologists from 13 institutions in the United States. Information on the pathologist’s experience was collected. These included years of experience, years of diagnostic anatomic pathology practice, responsibility for review of PD-L1 cases, and volume of PD-L1 cases assessed per month. The pathologists had 5–27 years of experience and most signed out <20 PD-L1 cases a month (Table 1).

Table 1.

Pathologist Information collected from 12 of the pathologists including experience and type of PD-L1 assay performed.

Years in practice?	Years of PD-L1 sign-out experience?	Estimated PD-L1 case volume/month?	Do you perform PD-L1 IHC stain in house?	If you do IHC in house, which assay/kit do you use?	Is PD-L1 IHC a reflex order by pathologist?	Is PD-L1 IHC ordered by clinician?	Have you taken any training for PD-L1 interpretation? If yes, online or live training?
4	4	20	Yes	Dako, PD-L1 clone 22C3	No	Yes	No
7	4	10	Yes	Dako, PD-L1 clone 22C3	No	Yes	Yes
10	5	20	Yes	E1L3N LDT	No	Yes	Yes
11	3	20	Yes	Dako, PD-L1 clone 22C3	Yes	Yes	Yes
12	1	10	No	N/A	Yes	Yes	Yes
13	6	60	Yes	Dako, PD-L1 clone 22C3, plus others	Yes	No	No
13	5	200	Yes	Dako, PD-L1 clone 22C3, plus others	Yes	Yes	Yes
14	0	20	No	N/A	NA	Yes	No
17	3	80	Yes	Dako, PD-L1 clone 22C3, plus others	No	Yes	No
20	5	10	Yes	Dako, PD-L1 clone 22C3	No	Yes	Yes
21	3	-	Yes	Dako, PD-L1 clone 22C3, plus others	Yes	Sometimes	-
27	5	25	Yes	E1L3N LDT	Yes	Generally not	Yes

Open in a new tab

Pathologist scoring

Pathologists were instructed to provide two CPS-based scores. (1) score the cases as <1 or ≥1, and (2) give a CPS score to the nearest 5 or 10 (1, 5, 10, etc.) as an ordinal score. Examples of concordant and discordant cases scores as <1 and ≥1 are shown in Figure 1. Pathologists also indicated if they estimated or counted to get the CPS scores. Of the 14 pathologists, two counted, two used a mix of counting and estimates where they estimated all cases, but counted those that were >1, and the rest (71.4%, n=10) used only estimates.

Figure 1. — Representative images from cohort representing cases with 100% (A, B) and <90% (C, D) agreement among pathologists. 20x image shown with magnified view (inset) (A) example case 1; 100% of pathologists called “<1”. (B) example case 2; 100% of pathologists called “≥1”. (C) example case 3; 79% of pathologists called “<1”. (D) example case 4; 79% of pathologists called “≥1”.

GeneXpert qRT-PCR assay

The research use only (RUO) version of the GeneXpert^® PD-L1 panel prototype assay was performed as previously described ¹¹. Briefly, 5-μM FFPE whole tissue sections (WTS) were collected and lysed using 20 μl Proteinase K and 1.2 ml FFPE lysis reagent, then incubated for 30 min at 80°C. The sample was then transferred and 1.2 mL of ≥95% ethanol was added. The samples were vortexed, centrifuged, and transferred to a PD-L1 panel prototype cartridge and 520 μl of each sample was loaded on the GeneXpert (GX) system. This closed-system assay performs quantitative RT-PCR in one step and gives a cycle threshold (Ct) output value for the endogenous control, POLR2J, and four target genes: CD274, PDCD1LG2, CD8A and IRF1. Of the 112 cases, 111 passed the internal quality controls of the cartridges. Results were then normalized by subtracting the Ct of each target from the Ct of the control gene POLR2J, defined as delta Ct (dCt). Final data was transformed and presented as dCt+10 for easier interpretation. Amplicon length was developed to minimize fragmentation and linearity studies showed that all genes were consistent.

Statistical analysis

Spearman correlation was used to assess associations between experience and likelihood of agreeing with other pathologists. To measure the change in overall percent agreement (OPA) as a function of the number of observers, Observers Needed to Evaluate Subjective Tests (ONEST) plots were used ¹². These analyses were carried out using the onest and irr packages in RStudio as previously described ¹³. Briefly, the algorithm calculated all permutations of the number of pathologists using n! (14! = 8.72×10¹⁰ combinations), and randomly selected and plotted 100 permutations against the number of pathologists. To interpret the plot, the plateau of the graphs is determined to see the best OPA (y-axis) that can be achieved with the minimum number of pathologists (x-axis). The plateau is defined by looking at the lower limits of the graph and defined as a change equal to 2% or less. All data analysis was performed using Rstudio (version 3.6.0) and PRISM 9.1.2 (GraphPad, San Diego, CA).

Results

Pathologist assessment using CPS

Of the 108 cases read by all pathologists, 51.9% (n=56) had > 90% agreement when pathologists scored using binary system (CPS<1 or CPS>1) as the positive/negative cut off point. When using a CPS of 10, pathologists had > 90% agreement on 81.5% (n=88) of the cases (Figure 2 A, B). Since some FDA-approved companion diagnostics for solid tumors, use CPS categories 1, 1–20, and > 20, we also looked at those categories. Since relatively few cases express enough PD-L1 to be called CPS>20, our study is more limited for these very high scores. Only 17 cases were called >20 by any pathologist in the study. The distribution of cases is shown in Figure 2 C. When the 3 categories versus dichotomized categories were used, 9.3% (n=10) of the cases were given a score in each category by at least one pathologist (Figure 2 D).

Figure 2. — Percentage of observers who called the cases positive using 3 categories. X-axis shows the individual cases, y-axis is the percent of 14 observers who called a case a certain CPS score. A solid vertical line represents 100% agreement. (A) CPS <1/≥1, (B) CPS <10/≥10, (C) CPS <1, 1–20, <20, and (D) Magnified look at the 10 cases given scores in the 3 ranges (CPS <1, 1–20, <20).

To assess if experience affected the likelihood of agreeing with other pathologists, years of experience, years signing out PD-L1 specifically, and volume of PD-L1 cases per month were analyzed. First, each case was assigned a score based on a majority consensus (> 50%). Three cases that were split by 50% were not included. When looking at the remaining 105 cases, experience did not significantly affect the likelihood of a pathologist agreeing with the majority. This was also true when looking only at the cases with high agreement (> 90%) for CPS <1/ ≥1 (Figure 3 A–C). Using estimations or counts did not affect the likelihood of agreeing with other pathologists (not shown).

Figure 3. — Percentage agreement between pathologists/observers when looking at all of the cases, and only the cases where all pathologists agreed >90% of the time. Agreement versus (A) years as a practicing pathologist; (B) years signing out PD-L1 cases; (C) volume of PD-L1 cases signed out/ month. For all graphs, linear regressions showed no slopes were significantly non-zero.

While the FDA requires 3 pathologists at 3 independent sites for proof of assay reproducibility, each “reader” is compared to a consensus score to calculate overall percent agreement (OPA) which is shown in the FDAs published summary of safety and effectiveness documents (SSED) ¹⁴. While this is the standard approach used for subjective companion diagnostic assays submitted to the FDA, we believe it may not represent real world performance of these assays. In the real world there are at least hundreds, if not thousands, of pathologists that assess PD-L1 expression. To try to better understand pathologist reader agreement in the real world, we used the ONEST method described above in the Methods, where inter-rater agreement is illustrated (expressed as overall percent agreement on the Y-axis versus the number of pathologist readers on the X-axis). In previous work we showed how this could be a more accurate representation of subjective assay performance when there are many observers (as seen in the real world) ¹².

Here, ONEST plots were evaluated using both dichotomous and ordinal cut points to reflect previous usage of CPS in clinical trials. For all cut points evaluated, the OPA decreased as the number of readers/pathologists was increased. For the dichotomous cut points, CPS <1/ ≥1 the OPA decreased and plateaued at 7 readers, reaching an OPA of 35.2% (Figure 4A). This suggests that overall percent agreement amongst pathologists in the real world may be as low as about 35%. By contrast, CPS <10/ ≥10 reached a plateau at 5 readers with an OPA of 72.3% (Figure 4B), and CPS <20/ ≥20 reached a plateau at 4 readers with an OPA of 86.6% (Figure 4C). Importantly, for all the cut points, though the plateau was reached at the minimum OPA, there was also a broad range within the calculated number of readers, calculated as the difference between the minimum and maximum OPA (CPS <1/ ≥1, Δ=16.7%; CPS <10/ ≥10, Δ=18.5%; CPS <20/ ≥20, Δ=12.0%; Figure 4A–C, red brackets)

Figure 4. — ONEST plots showing overall percent agreement (OPA) between pathologists/observers as a function of the number of observers. Randomly selected curves (n=100) for all possible combinations of observers using 2 category cutoffs for PD-L1 (A) combine positive score (CPS) <1/≥1, (B) CPS <10/≥10, (C) CPS <20/≥20. Red lines indicate the range of the OPA from minimum to maximum at the plateau.

When there are more categories, there is always a higher likelihood of disagreement between readers. None-the-less, we tested both 3 and 5 categories scoring systems to illustrate this principal. We used two score sets: 3 categories (<1, 1–20, >20), and 5 categories (<1, 1–10, 10–20, 20–50 and >50). For 3 categories, the graph plateaued at 5 pathologists with the lowest OPA at 25.9%. For the 5 categories, the graph plateaued at 4 pathologists with the lowest OPA at 29.6% (Figure 5 A, B).

Figure 5. — ONEST plots showing overall percent agreement (OPA) between pathologists/observers using (A) 3 and (B) 5 category cutoffs for PD-L1 CPS.

While we believe that the ONEST method represents a good approach and is more attractive to pathologists who favor images or figures over tables, it is not the traditional statistical method for analysis of this type of data. Table 2 shows the more conventional analysis of inter-rater data including overall percent agreement, Fleiss’ kappa ¹⁵ and intraclass correlation coefficient (ICC) ¹⁶.

Table 2.

Statistical comparisons of agreement of 14 Pathologist scores at different combined positive score (CPS) cut points by overall percent agreement, Fleiss Kappa, and intraclass correlation coefficient.

Scoring Category	OPA	Fkappa	ICC
CPS <1, ≥1	31.48(22.72,40.24)	0.477(0.458,0.497)	0.484(0.403,0.571)
CPS <10, ≥10	67.59(58.77,76.42)	0.604(0.584,0.624)	0.607(0.538,0.679)
CPS <20, ≥20	83.33(76.3,90.36)	0.626(0.606,0.646)	0.629(0.562,0.698)
CPS - 3 groups: 1,1–20,20	27.78(19.33,36.23)	0.405(0.389,0.422)	0.617(0.538,0.694)
CPS - 5 groups: 1,1–10,10–20,20–50,50	25.93(17.66,34.19)	0.337(0.322,0.352)	0.708(0.64,0.772)

Open in a new tab

Quantitative mRNA

Due to the subjective nature of the PD-L1 IHC assay, there is no ground truth. Arguably, the assay should only be evaluated against patient outcome in clinical trials, but that setting does not address reproducibility of the assay. However, IHC is a biochemical test and the molecular target of the drug is the PD-L1 protein. We and others have struggled with accurately measuring the protein in situ ^17,18. While protein does not equal mRNA, the measurement of mRNA in serial sections of IHC assays is quite simple and could represent a molecular ground truth by which to compare the subjective IHC assay. So here, the GeneXpert^® PD-L1 panel prototype assay was completed on 111 of the 112 cases. For the purposes of this study, we focused on CD274, the mRNA that codes for PD-L1. We found that there was no correlation between PD-L1 IHC scores and mRNA expression levels. When looking at the distribution of PD-L1 mRNA by average CPS <1/ ≥1 or CPS </≥10 or CPS </≥20, none were found to be significantly different (Figure 6).

Figure 6. — Assessment of continuous quantitative mRNA measurement compared to PD-L1 (A) combined positive score (CPS) <1/≥1, (B) CPS <10/≥10, (C) CPS <20/≥20. Mann-Whitney, two-tailed t-tests showed no significant differences.

Discussion

CPS, in concept, is a good solution for assessment of PD-L1 expression. As more was understood about the mechanism of action of ICIs ³ it was concluded that assessment of only tumor cells or only host immune cells was likely to give an incomplete picture. The problem is that, as designed, CPS was challenging to implement due to the difficulty of counting some important cell types. The FDA requirements for comparisons between readers and consensus reads are only comparisons of two reads (a reader against the consensus) ¹⁴. While SSEDs can show compelling overall percent agreement statistics, we argue that this approach does not reflect the real-world performance of the test. In fact, as shown in ONEST plots (Figures 4 and 5), when two to three readers are compared OPA ranges from very low (around 30% for 3 readers) to very high (>95% for two readers). While the FDA has made and enforced standards for subjective tests, it is not clear they are rigorous enough. While the CPS concept, assessment of both tumor and host cell expression, is attractive, the data in this paper suggest that new methods for assessment of PD-L1 expression in its various cell types should be sought to better serve patients in the future.

The current most common cut point for the CPS assay is <1 vs >1. Unfortunately, our data suggest this is not a cut-point with high concordance. Using the FDA-approved Dako PD-L1 IHC 22C3 PharmDx assay, we found that using CPS 10 or 20 as a cut off improves OPA over CPS 1 (~60% versus ~30%). In this study, we aimed to evaluate pathologists’ agreement when reading PD-L1 CPS in gastric and gastroesophageal junction cancer cases where the CPS cut-point is around 1. More recently CPS<10 vs >10 has been used in breast cancer ¹⁹ and while that was not evaluated here, our data support the concept that higher CPS scores might generate better concordance. However, both the CPS 10 and CPS 20 cut-points still plateau at 60–70%. That is substantially lower than better established assays like estrogen receptor in breast cancer where the plateau is above 90% (data not shown). In breast cancer, CPS is only a companion diagnostic in the metastatic setting ¹⁹ and not in the neo-adjuvant setting ²⁰. Our data may suggest that the difference between the neoadjuvant and metastatic setting may be based on the assay, not the biology.

A recent global study by Nuti et al showed the effectiveness of training on improving interobserver and intraobserver concordance rates for scoring PD-L1 CPS in multiple cancer types. This study showed an interobserver concordance of 91.5% for gastric cancer with 120 observers ²¹. There are two key differences in our study compared to the Nuti et al work. The first is that the pathologists were trained for 2 days prior to scoring a total of 30 slides (half of which were repeated the following day), showing the importance of uniform training. In contrast, our study was a single review study that assessed what is currently occurring in academic pathology labs. The second and more important difference is how the overall percentage agreement (OPA) was calculated. In Nuti et al, all OPAs were calculated between 2 readers (the test reader and the reference score). When comparing only 2 readers there can be a very high OPA (nearly 100%) or a very low OPA (around 40%) as shown in figure 4A. After training, Nuti et al find an average of 91.5%. However, it would be interesting if Nuti et al would compare OPA as we did between trios, quartets, quintets etc of pathologists to see if the high OPAs also fall apart even after training. We believe that the comparison of multiple readers in the calculation of the OPA (as illustrated in Figure 4) better represents the performance of the assay in the real world.

One of the challenges of CPS is the absence of a criterion standard. Absolute measurement of PD-L1 protein has been attempted ^17,18 but never comprehensively proven to show association with response. Furthermore, since there is no gold standard for CPS there is no way of effectively using image analysis and machine learning to help address the discrepancy that we have shown. The use of AI/ machine-learning is a powerful tool and could someday provide a method for improve reproducibility. However, it seems more likely that that method would need to be integrated into the clinical trial. In the absence of ground truth for CPS, we tested a quantitative method for assessment of PD-L1 mRNA as a potential comparator. However, we found thatmRNA by qRT-PCR is not associated with CPS score. However, this mRNA assay has not been shown to predict ICI treatment outcome in this setting, although it is present in some signatures that do predict outcome ²².

There are a number of limitations to this work. CPS is assessed in many tumor types in combination with the pembrolizumab indication, but here only gastroesophageal junction and gastric cancer are used as test cases in this study. It is possible that other tumor types would show better (or worse) performance. Another limitation of the work is the selection of cases. While we attempted serial collection, the Yale archives were not rich enough to represent the number and range of cases needed for this study. So, we collected more non-serial cases to enrich for cases expressing some level of PD-L1. In fact, another weakness of the study is the relative paucity of very high PD-L1-expressing cases. These cases are rare in the real world but were enriched in this test set in order to examine the pathologist reader’s proficiency throughout the dynamic range of PD-L1 expression.

In summary, we find that CPS, as a method of assessment of PD-L1 expression by IHC is not a reproducible assay. When using the ONEST method to assess multiple pathologist readers as occurs in the real world, the overall percent agreement is quite low and probably unreliable in its current clinical usage.

Acknowledgments

Thank you to Lori Charette and the rest of the Yale Pathology Tissue Services core facility.

Funding

This work is supported by sponsored research agreement from Cepheid. Dr. Rimm is also supported by the Yale SPORE in Lung Cancer and the Yale Cancer Center CCSG. This publication was in part, made possible by CTSA Grant Number TL1 TR001864 supporting Dr. Aileen Fernandez from the National Center for Advancing Translational Science (NCATS), a component of the National Institutes of Health (NIH). Dr. Fernandez is supported by the Burroughs Wellcome Fund Postdoctoral Enrichment Program Award PDEP1022351.

Footnotes

Conflicts of Interest

David L. Rimm has served as an advisor for Astra Zeneca, Agendia, Amgen, BMS, Cell Signaling Technology, Cepheid, Danaher, Daiichi Sankyo, Genoptix/Novartis, GSK, Konica Minolta, Merck, NanoString, PAIGE.AI, Roche, and Sanofi. Amgen, Cepheid, NavigateBP, NextCure, and Konica Minolta have recently funded research in David L. Rimm’s lab.

Leena McCann, Yvonne G. Chan, Jodi Weidler, and Michael Bates are employees and stockholders in Cepheid, Inc. Other authors have no relevant conflicts of interest.

Ethics Approval and Consent to Participate

Written informed consent or waiver of consent was provided by all the patients. This study was approved by Yale Human Investigation Committee protocol ID 9505008219. This study was performed in accordance with the Declaration of Helsinki.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

1.Gong J, Chehrazi-Raffle A, Reddi S, Salgia R Development of PD-1 and PD-L1 inhibitors as a form of cancer immunotherapy: a comprehensive review of registration trials and future considerations. Journal for immunotherapy of cancer 2018; 6: 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Pardoll DM The blockade of immune checkpoints in cancer immunotherapy. Nat Rev Cancer 2012; 12: 252–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Chen DS,Mellman I Oncology meets immunology: the cancer-immunity cycle. Immunity 2013; 39: 1–10. [DOI] [PubMed] [Google Scholar]
4.United States Food and Drug Administration. List of Cleared or Approved Companion Diagnostic Devices (In Vitro and Imaging Tools). 2022. www.fda.gov/medical-devices/in-vitro-diagnostics/list-cleared-or-approved-companion-diagnostic-devices-in-vitro-and-imaging-tools.
5.Fuchs CS, Doi T, Jang RW et al. Safety and efficacy of pembrolizumab monotherapy in patients with previously treated advanced gastric and gastroesophageal junction cancer: phase 2 clinical KEYNOTE-059 trial. JAMA oncology 2018; 4: e180013–e180013. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Davis AA,Patel VG The role of PD-L1 expression as a predictive biomarker: an analysis of all US Food and Drug Administration (FDA) approvals of immune checkpoint inhibitors. Journal for immunotherapy of cancer 2019; 7: 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Dako North America, Inc. Interpretation Manual - Gastric or Gastroesophageal Junction Adenocarcinoma. 2019. www.agilent.com/cs/library/usermanuals/public/29219_pd-l1-ihc-22C3-pharmdx-gastric-interpretation-manual_us.pdf
8.Jayasingam SD, Citartan M, Thang TH et al. Evaluating the polarization of tumor-associated macrophages into M1 and M2 phenotypes in human cancer tissue: technicalities and challenges in routine clinical practice. Frontiers in Oncology 2020; 9: 1512. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.College of American Pathologists (CAP). CAP Proficiency Testing (PT) Programs, Quality Assessment Survey. 2021B. www.cap.org/laboratory-improvement/proficiency-testing.
10.Dako North America, Inc. PD-L1 IHC 22C3 pharmDx. Agilent & Pathology Solutions. 2017. https://www.agilent.com/cs/library/usermanuals/public/29219_pd-l1-ihc-22C3-pharmdx-gastric-interpretation-manual_us.pdf.
11.Gupta S, McCann L, Chan YG et al. Closed system RT-qPCR as a potential companion diagnostic test for immunotherapy outcome in metastatic melanoma. Journal for ImmunoTherapy of Cancer 2019; 7: 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Reisenbichler ES, Han G, Bellizzi A et al. Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer. Modern Pathology 2020; 33: 1746–1752. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Han G, Schell MJ, Reisenbichler ES, Guo B, Rimm DL Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies. Statistics in medicine 2022; 41: 1361–1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.United States Food and Drug Administration. Summary of safety and effectiveness data (SSED) for PD-L1 IHC 22C3 pharmDx. 2019. www.accessdata.fda.gov/cdrh_docs/pdf15/P150013S016B.pdf.
15.Fleiss JL Measuring nominal scale agreement among many raters. Psychological bulletin 1971; 76: 378. [Google Scholar]
16.Fisher RA 014: On the” Probable Error” of a Coefficient of Correlation Deduced from a Small Sample. 1921.
17.Martinez-Morilla S, McGuire J, Gaule P et al. Quantitative assessment of PD-L1 as an analyte in immunohistochemistry diagnostic assays using a standardized cell line tissue microarray. Laboratory investigation 2020; 100: 4–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Morales-Betanzos CA, Lee H, Ericsson PIG et al. Quantitative mass spectrometry analysis of PD-L1 protein expression, N-glycosylation and expression stoichiometry with PD-1 and PD-L2 in human melanoma. Molecular & Cellular Proteomics 2017; 16: 1705–1717. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Cortes J, Cescon DW, Rugo HS et al. Pembrolizumab plus chemotherapy versus placebo plus chemotherapy for previously untreated locally recurrent inoperable or metastatic triple-negative breast cancer (KEYNOTE-355): a randomised, placebo-controlled, double-blind, phase 3 clinical trial. The Lancet 2020; 396: 1817–1828. [DOI] [PubMed] [Google Scholar]
20.Schmid P, Cortes J, Dent R et al. Event-free survival with pembrolizumab in early triple-negative breast cancer. New England Journal of Medicine 2022; 386: 556–567. [DOI] [PubMed] [Google Scholar]
21.Nuti S, Zhang Y, Zerrouki N et al. High interobserver and intraobserver reproducibility among pathologists assessing PD-L1 CPS across multiple indications. Histopathology 2022; 81: 732–741. [DOI] [PubMed] [Google Scholar]
22.Ayers M, Lunceford J, Nebozhyn M et al. IFN-γ–related mRNA profile predicts clinical response to PD-1 blockade. The Journal of clinical investigation 2017; 127: 2930–2940. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

[R1] 1.Gong J, Chehrazi-Raffle A, Reddi S, Salgia R Development of PD-1 and PD-L1 inhibitors as a form of cancer immunotherapy: a comprehensive review of registration trials and future considerations. Journal for immunotherapy of cancer 2018; 6: 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Pardoll DM The blockade of immune checkpoints in cancer immunotherapy. Nat Rev Cancer 2012; 12: 252–264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Chen DS,Mellman I Oncology meets immunology: the cancer-immunity cycle. Immunity 2013; 39: 1–10. [DOI] [PubMed] [Google Scholar]

[R4] 4.United States Food and Drug Administration. List of Cleared or Approved Companion Diagnostic Devices (In Vitro and Imaging Tools). 2022. www.fda.gov/medical-devices/in-vitro-diagnostics/list-cleared-or-approved-companion-diagnostic-devices-in-vitro-and-imaging-tools.

[R5] 5.Fuchs CS, Doi T, Jang RW et al. Safety and efficacy of pembrolizumab monotherapy in patients with previously treated advanced gastric and gastroesophageal junction cancer: phase 2 clinical KEYNOTE-059 trial. JAMA oncology 2018; 4: e180013–e180013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Davis AA,Patel VG The role of PD-L1 expression as a predictive biomarker: an analysis of all US Food and Drug Administration (FDA) approvals of immune checkpoint inhibitors. Journal for immunotherapy of cancer 2019; 7: 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Dako North America, Inc. Interpretation Manual - Gastric or Gastroesophageal Junction Adenocarcinoma. 2019. www.agilent.com/cs/library/usermanuals/public/29219_pd-l1-ihc-22C3-pharmdx-gastric-interpretation-manual_us.pdf

[R8] 8.Jayasingam SD, Citartan M, Thang TH et al. Evaluating the polarization of tumor-associated macrophages into M1 and M2 phenotypes in human cancer tissue: technicalities and challenges in routine clinical practice. Frontiers in Oncology 2020; 9: 1512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.College of American Pathologists (CAP). CAP Proficiency Testing (PT) Programs, Quality Assessment Survey. 2021B. www.cap.org/laboratory-improvement/proficiency-testing.

[R10] 10.Dako North America, Inc. PD-L1 IHC 22C3 pharmDx. Agilent & Pathology Solutions. 2017. https://www.agilent.com/cs/library/usermanuals/public/29219_pd-l1-ihc-22C3-pharmdx-gastric-interpretation-manual_us.pdf.

[R11] 11.Gupta S, McCann L, Chan YG et al. Closed system RT-qPCR as a potential companion diagnostic test for immunotherapy outcome in metastatic melanoma. Journal for ImmunoTherapy of Cancer 2019; 7: 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Reisenbichler ES, Han G, Bellizzi A et al. Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer. Modern Pathology 2020; 33: 1746–1752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Han G, Schell MJ, Reisenbichler ES, Guo B, Rimm DL Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies. Statistics in medicine 2022; 41: 1361–1375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.United States Food and Drug Administration. Summary of safety and effectiveness data (SSED) for PD-L1 IHC 22C3 pharmDx. 2019. www.accessdata.fda.gov/cdrh_docs/pdf15/P150013S016B.pdf.

[R15] 15.Fleiss JL Measuring nominal scale agreement among many raters. Psychological bulletin 1971; 76: 378. [Google Scholar]

[R16] 16.Fisher RA 014: On the” Probable Error” of a Coefficient of Correlation Deduced from a Small Sample. 1921.

[R17] 17.Martinez-Morilla S, McGuire J, Gaule P et al. Quantitative assessment of PD-L1 as an analyte in immunohistochemistry diagnostic assays using a standardized cell line tissue microarray. Laboratory investigation 2020; 100: 4–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Morales-Betanzos CA, Lee H, Ericsson PIG et al. Quantitative mass spectrometry analysis of PD-L1 protein expression, N-glycosylation and expression stoichiometry with PD-1 and PD-L2 in human melanoma. Molecular & Cellular Proteomics 2017; 16: 1705–1717. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Cortes J, Cescon DW, Rugo HS et al. Pembrolizumab plus chemotherapy versus placebo plus chemotherapy for previously untreated locally recurrent inoperable or metastatic triple-negative breast cancer (KEYNOTE-355): a randomised, placebo-controlled, double-blind, phase 3 clinical trial. The Lancet 2020; 396: 1817–1828. [DOI] [PubMed] [Google Scholar]

[R20] 20.Schmid P, Cortes J, Dent R et al. Event-free survival with pembrolizumab in early triple-negative breast cancer. New England Journal of Medicine 2022; 386: 556–567. [DOI] [PubMed] [Google Scholar]

[R21] 21.Nuti S, Zhang Y, Zerrouki N et al. High interobserver and intraobserver reproducibility among pathologists assessing PD-L1 CPS across multiple indications. Histopathology 2022; 81: 732–741. [DOI] [PubMed] [Google Scholar]

[R22] 22.Ayers M, Lunceford J, Nebozhyn M et al. IFN-γ–related mRNA profile predicts clinical response to PD-1 blockade. The Journal of clinical investigation 2017; 127: 2930–2940. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multi-institutional study of pathologist reading of the PD-L1 combined positive score (CPS) immunohistochemistry assay in gastric or gastroesophageal junction cancer

Aileen I Fernandez

Charles J Robbins

Patricia Gaule

Diana Agostini-Vulaj

Robert A Anders

Andrew Bellizi

Wei Chen

Zongming Eric Chen

Purva Gopal

Lei Zhao

Mikhail Lisovsky

Xiuli Liu

Jinru Shia

Huamin Wang

Zhaohai Yang

Leena McCann

Yvonne G Chan

Jodi Weidler

Michael Bates

Xuchen Zhang

David L Rimm

Abstract

Introduction

Materials and Methods

Cohort information

Slide preparation and distribution

Slide distribution and pathologist information

Table 1.

Pathologist scoring

Figure 1.

GeneXpert qRT-PCR assay

Statistical analysis

Results

Pathologist assessment using CPS

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Table 2.

Quantitative mRNA

Figure 6.

Discussion

Acknowledgments

Funding

Footnotes

Data Availability Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases