Skip to main content
Physiotherapy Canada logoLink to Physiotherapy Canada
. 2014 Apr 25;66(2):153–159. doi: 10.3138/ptc.2013-23

Examining Interrater Reliability and Validity of a Paediatric Cardiopulmonary Physiotherapy Discharge Tool

Jamil Lati *,, Vanessa Pellow *, Jeannine Sproule *, Dina Brooks , Cindy Ellerton
PMCID: PMC4006408  PMID: 24799752

ABSTRACT

Purpose: To determine the interrater reliability (IRR) of the individual items in the Paediatric Cardiopulmonary Physiotherapy (CPT) Discharge Tool. This tool identifies six critical items that physiotherapists should consider when determining a paediatric patient's readiness for discharge from CPT after upper-abdominal, cardiac, or thoracic surgery: oxygen saturation, mobility, secretion retention, discharge planning, auscultation, and signs of respiratory distress. Methods: A total of 33 paediatric patients (ages 2 to <19 years) who received at least 1 day of CPT following cardiac, thoracic, or upper-abdominal surgery were independently assessed using the Paediatric CPT Discharge Tool by two designated assessors, who assessed each patient within 4 hours of each other. Results: Kappa analysis showed the following levels of interrater agreement for the six items of the Paediatric CPT Discharge Tool: Oxygen Saturation, excellent (κ=0.80); Mobility, substantial (κ=0.62); Secretion Clearance, moderate (κ=0.39); Discharge Planning, fair (κ=0.37); and Auscultation and Respiratory Distress, poor (κ=0.24 and κ=−0.08, respectively). Conclusion: Several of the items in the Paediatric CPT Discharge Tool demonstrate good IRR. The discharge tool is ready for further psychometric testing, specifically validity testing.

Key Words: outcome assessment, patient discharge, pediatrics, reproducibility of results, cardiopulmonary physiotherapy


For adults, the occurrence of postoperative respiratory complications following cardiac, thoracic, and upper-abdominal surgery has been well documented in the literature.113 Children have the potential to develop similar postoperative complications;14 in fact, infants are at greater risk of postoperative respiratory failure because of underdeveloped intercostal muscles, a compliant chest wall with less compliant lungs, and poorly established collateral ventilation.15

Cardiopulmonary physical therapy (CPT), which aims to prevent and treat postoperative respiratory complications such as atelectasis, pneumonia, and secretion retention using a wide variety of techniques,16 is currently an important component of the perioperative care provided to children who have undergone upper-abdominal, thoracic, or cardiac surgery. In 2011, Novoa and colleagues showed that perioperative intensive CPT reduced morbidity in patients who underwent lobectomy,17 demonstrating that CPT is an important adjunct to care, in contrast to nursing guidance that recommends incentive spirometry and mobility alone.17

Health care systems continue to face limited resources in many areas of practice, which places increased responsibility on health care professionals to justify their treatment choices. The recent trend toward early discharge from acute care has made it essential to ensure that physiotherapy discharge criteria are evidence based.

In 2001, Brooks and colleagues developed clinical practice guidelines (CPGs) for perioperative CPT,18 the purpose of which was to evaluate the evidence for applying CPT techniques in managing patients after thoracic, cardiac, and abdominal surgery. Brooks and colleagues found that studies evaluating the effectiveness of CPT have been characterized by inconsistencies in identifying postoperative pulmonary complications.18 They subsequently developed a postoperative cardiopulmonary discharge tool (POP-DST) for the adult population to fill this gap in the literature.19 The purpose of the POP-DST is to determine when a patient should be discharged from CPT; it was developed for use specifically in adults who have undergone thoracic, cardiovascular, or upper-abdominal surgery. The POP-DST has demonstrated interrater reliability and predictive validity for patients at low risk of developing postoperative pulmonary complications after discharge.19

In 2011, using a modified Delphi technique, Ellerton and colleagues developed a paediatric CPT discharge tool to predict when a paediatric patient can be successfully discharged from CPT without relapse/re-referral (see Figure 1).20 The Paediatric CPT Discharge Tool is intended for use with children and youth aged 2–18 years who receive CPT following cardiac, thoracic, or upper-abdominal surgery. In their 2011 article, Ellerton and colleagues documented the process of determining the content of the tool and then evaluating its face and content validity and its feasibility.20

Figure 1.

Figure 1

Paediatric Cardiopulmonary Physiotherapy Discharge Tool – Draft 2

The purpose of the Paediatric CPT Discharge Tool is to improve clinical decision making, communication among physiotherapists, and use of resources and to provide an outcome measure that could form the basis of future interventional studies. The purpose of our study was to determine the interrater reliability (IRR) of the Paediatric CPT Discharge Tool by evaluating the IRR of its individual items. Our secondary objectives were to identify potential scoring methods for future study and to explore validity by comparing the opinions of the treating and assessing therapists as to each patient's readiness for discharge.

Methods

The Research Ethics Boards at the Hospital for Sick Children and the University of Toronto approved this study. All participants over age 16 years, and the parents of those under age 16 years, provided signed informed consent before participating in the study. Informed assent was also obtained from participants between 8 and 15 years of age.

Participants

Participants were recruited from the Hospital for Sick Children in Toronto, Canada, over a 6-month period, using a consecutive sampling method. Children aged 2–18 years who had received at least 1 day of CPT during an in-patient admission after thoracic, cardiac, or upper-abdominal surgery were eligible to participate. Children with chronic but stable neurological or orthopaedic conditions were eligible for inclusion in the study, as the tool was designed to account for variability in baseline functional capacity.

Study design

We used a prospective blinded quantitative design to explore the psychometric properties of the paediatric CPT discharge tool developed by Ellerton and colleagues.20 We used two blinded assessors, both physiotherapists; Assessor A had 1 year's experience and Assessor B had 5 years' experience in CPT.

To test the complete range of patient acuity, we recruited participants from postoperative day 1 through 5. The consecutive sample design captured patients of varying degrees of severity within each postoperative day stratum.

Interrater reliability testing

IRR was evaluated using two physiotherapist assessors, chosen to reflect the range of experience at the Hospital for Sick Children. Neither assessor was the patients' treating therapist.

The assessors were given a current summary of each patient's relevant medical history from the treating therapist, which included activity orders, precautions, and contraindications. Assessors were blinded to the patients' surgical dates and did not have access to the charts, as physiotherapy progress notes might have biased their clinical impressions of the patients. The assessors independently evaluated each participant by selecting one of the mutually exclusive options for each of the six discharge criteria on the Paediatric CPT Discharge Tool (see Figure 1). The two assessors completed the Paediatric CPT Discharge Tool for each participant within 4 hours of each other.

To help us determine the overall IRR of the decision to discharge from CPT, each assessor was prompted to provide a professional opinion as to the patient's readiness for discharge from CPT, regardless of how the individual items on the tool were scored. We also used this information to explore the tool's concurrent validity.

Patients were stratified by postoperative day (i.e., five strata for days 1–5). Within each stratum, the assignment of the two therapists to the first patient assessment was randomized in blocks of four, to limit the impact of patient fatigue on the second assessor's scoring in relation to the first. Blocking ensured that each assessor conducted a similar number of first assessments. On completing each assessment, the assessors submitted their forms directly to the data manager for data entry and analysis.

Discharge tools, consent forms, and assent forms were collated and maintained by the data handler/manager in a locked filing cabinet in the Department of Rehabilitation Services at the Hospital for Sick Children. Data were entered into the Statistical Package for the Social Sciences, version 12.0 (SPSS Inc., Chicago, IL). Backup files were maintained on a second computer to limit any potential loss of data. The data were stripped of identifiers and entered into MS Excel 2003 (Microsoft Corp., Redmond, WA) for inclusion in the data analysis.

Consideration of scoring

A secondary objective of our study was to explore the extent of agreement of the assessors' ratings for each item to help determine whether an aggregate or critical scoring method would be more appropriate for use with the discharge tool. Data collected for IRR testing were also used to complete an analysis of different scoring methods.

Exploring concurrent validity

We further examined the data to determine the relationship between the assessors' discharge opinions and those of the treating therapist. The two assessors were prompted to indicate whether they considered the patient ready for discharge from CPT treatment; subsequently, the treating therapist was asked to complete a form for each patient indicating whether or not they considered the patient ready for discharge from CPT (yes/no) on the date the IRR assessments occurred. The treating therapist's opinion was considered the “gold standard” against which to evaluate the tool until future predictive validity of the tool is established.

Data Analysis

Interrater reliability testing

We used the kappa statistic, a chance-corrected measure of agreement,21 to determine IRR for the individual items in the tool. Kappa considers both the proportion of observed agreement and the proportion of agreement expected by chance. Each item on the tool was given a score of 1 if the item criteria were met or 0 if they were not met. In addition to the kappa statistic, we also assessed IRR by calculating percent agreement as a simple ratio of how many agreements were achieved between the two assessors relative to the total number of participants.

Consideration of scoring

To consider the most appropriate method of scoring, we calculated the correlation between scores of Assessors A and B using the kappa statistic. Because the tool has six items, the first scoring option we examined was an aggregate score in which one point was given for each discharge criterion met and all points were then summed to give an aggregate score out of 6 (Total Score). The second option was a critical item scoring method, the Total Coded Score, which was determined by giving a score of 1 if ≥3 of the six critical items met their criteria and 0 if >3 of the six critical items did so.

We also considered the measurement properties of alternative aggregate scores by examining how removing individual items from the total score affected agreement between the assessing therapists. The agreement of Aggregate Score 1 was calculated by removing the two lowest kappa values; the agreement of Aggregate Score 2 was calculated by removing the three individual items with the lowest kappa values from the total score.

Exploring concurrent validity

To facilitate future predictive validity testing, the final phase of our analysis examined agreement by comparing the overall discharge opinions of Assessors A and B, then comparing each assessor's discharge opinions to those of the treating therapist. This analysis helped identify whether the assessors were assessing the participant within the same context.

The Total Coded Score was also compared to the treating therapist's discharge opinion. We used the Total Coded Score for this analysis because its dichotomous nature lends itself to comparison with the treating therapist's yes/no response on readiness for discharge. The assessors' Total Coded Scores were compared to the treating therapist's discharge opinion to ensure that they were all examining the same criteria when developing a discharge opinion. This information will be valuable in evaluating whether the method of data collection was appropriate for future criterion validity testing.

Results

A total of 33 participants were enrolled in the study; there was no attrition. Participants' mean age was 9.7 (SD 4.6) years (range 2.0–17.5). Participants were assessed on postoperative days ranging from day 1 through 5 (mean day 3 [SD 1]). Within the sample, 11 participants (33%) underwent upper-abdominal surgery, 16 (49%) underwent thoracic surgery, and 6 (18%) underwent cardiac surgery.

Interrater reliability testing

Table 1 summarizes the results of IRR testing between Assessors A and B for individual items in the discharge tool using the kappa statistic. Oxygen Saturation demonstrated excellent IRR (κ=0.80, p<0.001); Mobility demonstrated substantial IRR (κ=0.62, p<0.001); Secretion Clearance demonstrated moderate IRR (κ=0.39, p=0.02); Discharge Planning showed fair IRR (κ=0.37, p=0.02); and both Auscultation and Respiratory Distress showed poor IRR (κ=0.24, p=0.16; κ=−0.08, p=0.63, respectively). For items with p-values <0.05 (Oxygen Saturation, Mobility, Secretion Clearance, and Discharge Planning), IRR was considered statistically significant20 (p-values for Respiratory Distress and Auscultation items were not significant). Table 1 also shows percent agreement between the two assessors for individual items in the discharge tool. Oxygen Saturation, Mobility, Secretion Clearance, and Discharge Planning items showed excellent percent agreement between the assessors, while Auscultation and Respiratory Distress showed moderate percent agreement.

Table 1.

Interrater Reliability Analysis of Individual Items on the Paediatric CPT Discharge Tool

Item (Assessor A vs. Assessor B) κ value Interpretation of
κ (Correlation)
p-value % agreement
Auscultation 0.24 Poor 0.16 64
Respiratory Distress −0.08 Poor 0.63 64
Oxygen Saturation 0.80 Excellent <0.001* 91
Mobility 0.62 Substantial <0.001* 82
Secretion Clearance 0.39 Moderate 0.02* 82
Discharge Planning 0.37 Fair 0.02* 88
*

Statistically significant values.

CPT=cardiopulmonary physiotherapy.

Consideration of scoring

The measurement properties of several scoring methods are shown in Table 2. Total Score showed poor kappa correlation (κ=0.22, p=0.01) and poor overall percent agreement between assessors (36%); on the other hand, the dichotomous Total Coded Score showed moderate kappa correlation (κ=0.46, p=0.01) and excellent percent agreement (85%). Table 2 also shows the kappa correlations of the alternative aggregate scores, which were calculated by removing items that demonstrated poor kappa correlation during IRR testing. Both Aggregate Score 1 (Oxygen Saturation + Secretion Clearance + Mobility + Discharge Planning) (κ=0.66, p<0.001) and Aggregate Score 2 (Oxygen Saturation + Secretion Clearance + Mobility) (κ=0.65, p<0.001) showed substantial kappa correlation, and all aggregate scores demonstrated statistical significance (p-values<0.05).

Table 2.

Correlation Achieved When Considering Different Scoring Methods (Assessor A vs. Assessor B)

Tool Items Included in Score κ value Interpretation of
κ (Correlation)
p-value* % Agreement
Total Score 0.22 Poor 0.01 36
Total Coded Score 0.46 Moderate 0.01 85
Aggregate Score 1 (O2 Sat. + Secretion Clearance + Mobility + D/C Planning) 0.66 Substantial <0.001 76
Aggregate Score 2 (O2 Sat. + Secretion Clearance + Mobility) 0.65 Substantial <0.001 76
*

Statistically significant values.

o2 Sat.=oxygen saturation; D/C=discharge.

Exploring concurrent validity

Table 3 shows the relationship between the two assessors' opinions on readiness for discharge, as well as the relationships between the treating therapist's opinion and each assessor's opinion individually. The discharge opinions of the treating therapist showed a fair kappa correlation (κ=0.31, p=0.07) with those of both Assessor A and Assessor B; the discharge opinions of Assessors A and B showed a substantial kappa correlation (κ=0.75, p<0.001). The relationship between the assessing therapists' opinions on readiness for discharge reached statistical significance (p<0.05), but the relationship between the treating therapist's opinions and those of the two assessors did not (p>0.05).

Table 3.

Correlation of “Readiness for Discharge” Opinions

Relationship: D/C Opinion κ value Interpretation of
κ (Correlation)
p-value % Agreement
A vs. B 0.75 Substantial <0.001* 88
A vs. Treating PT 0.31 Fair 0.07 67
B vs. Treating PT 0.31 Fair 0.07 67
*

All Statistically significant value.

D/C=discharge; PT=physiotherapist.

Discussion

Our analysis found positive trends in IRR for several of the critical items in the discharge tool. Oxygen Saturation, Mobility, and Secretion Clearance all showed good to excellent IRR (κ=0.80, 0.62, and 0.39 respectively), perhaps because of the clear definitions of these variables. IRR was fair (κ=0.37) for Discharge Planning and poor for Auscultation and Respiratory Distress (κ=0.24 and −0.08 respectively). Note that although auscultation is commonly used in cardiopulmonary assessments, it has been shown to have questionable reliability and validity.22 Brooks and colleagues (1993) found only fair IRR (κ=0.26) for auscultation in a cohort of physiotherapists specializing in the cardiopulmonary field.

We found moderate kappa correlation between the two assessors when the analysis used a coded score, whereas kappa correlation was poor for the total score of all six items on the tool, although the percent agreement of the Total Coded Score was 85%. When we analyzed various aggregate scores, we found substantial kappa correlation by removing items with lower IRR. It would therefore be beneficial to consider using an aggregate scoring approach for the paediatric CPT discharge tool that includes all items except Auscultation and Respiratory Distress—the items with the lowest IRR.

Of 34 eligible patients, 33 were enrolled in this study, none of whom were re-referred to CPT within 3 days of discharge from CPT, for a 0% rate of re-referral to CPT following discharge.

Comparing the assessors' opinions on readiness for discharge and their Total Coded Scores showed significant correlation in both cases, which indicates that both assessors were basing their discharge opinions on the same criteria. This finding can be extrapolated to explain that the discharge tool accurately captures the criteria that are important in determining readiness for discharge. This may be helpful when the tool is used by an inexperienced therapist to assist in determining readiness for discharge from CPT.

We subsequently compared (a) each assessor's discharge opinion to the treating therapist's discharge opinion and (b) each assessor's Total Coded Score to the treating therapist's discharge opinion. In both cases we found poor overall kappa correlation, which may indicate a gap in the information available to the blind assessors (e.g., adherence to treatment regime, stage of recovery, change over time) that would assist in making the overall decision to discharge. However, this tool is intended for eventual use by treating therapists, who would have access to all pertinent information on their patients.

Limitations

The first potential source of error that must be considered when interpreting the results of IRR testing is the 4-hour window between assessments, which may explain the low level of agreement between the assessors' and the treating therapist's opinions on patients' readiness for discharge. Ideally, the two assessments should have occurred at the same time, to minimize potential sources of measurement error.

A second potential source of error was the low sample size (n=33). We used the kappa correlation statistic to correct for agreement that may be due to chance; however, kappa is ideally meant to be used for dichotomous measures (i.e., yes/no or 0/1). Because of the broad range of scoring options available when considering a total score (0–6), if scoring agreement was not exact, no correlation was found, leading to a low kappa value. A sample size of 55 would have ensured a minimum kappa of 0.60 for an item with excellent IRR.23

Conclusion

Overall, the Paediatric CPT Discharge Tool demonstrated good interrater reliability (IRR). Furthermore, our analysis shows that this tool is reliable when used by therapists with a range of experience in CPT. Limitations may be due in part to the small sample size of 33, to changes in patient status that may have occurred during the 4-hour gap between assessments, and to poor IRR of auscultation, as previously reported in the literature.22

Better agreement between the two assessors was achieved using the Total Coded Score rather than the total score of all six items. Analysis of various aggregate scores found substantial agreement when the two or three lowest-correlated items on the tool were removed. The best aggregate score eliminated the two items, Auscultation and Respiratory Distress, that had the lowest IRR. It would be beneficial to consider using either this aggregate scoring approach, in which the patient must achieve the criterion definition for four critical items (Oxygen Saturation, Mobility, Secretion Retention, and Discharge Planning), or a Total Coded Score, whereby patients who achieved ≥3 criteria were considered ready for discharge from CPT.

Key Messages

What is already known on this topic

Currently, physiotherapists practising cardiopulmonary physiotherapy (CPT) rely on historical or centre-specific approaches as well as their clinical impressions and experiences to make treatment decisions. There are no studies in the literature that incorporate or examine CPT discharge criteria in the paediatric population. Therapists must attempt to synthesize the individual components of an assessment into a clinical impression on which they can base their decision regarding a patient's readiness for discharge from CPT. Reliance on a therapist's experience to guide clinical decision making may be problematic, both because new physiotherapy graduates are often employed in entry-level CPT positions and because physiotherapists from other clinical specialties are often required to provide evening and weekend CPT service in the acute-care environment, which requires them to make clinical decisions regarding a less familiar caseload, often with little access to experts in the field.

What this study adds

A valid and reliable Paediatric CPT Discharge Tool would aid in clinical decision making for inexperienced therapists as well as providing support for more experienced therapists and those who specialize in areas other than CPT. Such a tool would enhance the design of future interventional research studies by providing an outcome measure against which to evaluate effectiveness of CPT treatments. This study evaluates the interrater reliability of the items of the tool and alternative scoring methods, thus completing an essential prerequisite for the evaluation of the Paediatric CPT Discharge Tool's predictive validity.

Physiotherapy Canada 2014; 66(2);153–159; doi:10.3138/ptc.2013-23

References


Articles from Physiotherapy Canada are provided here courtesy of University of Toronto Press and the Canadian Physiotherapy Association

RESOURCES