Skip to main content
AEM Education and Training logoLink to AEM Education and Training
. 2017 Sep 21;1(4):340–345. doi: 10.1002/aet2.10051

The Council of Emergency Medicine Residency Directors Speaker Evaluation Form for Medical Conference Planners

Andrew W Phillips 1,, David Diller 2, Sarah Williams 3, Yoon Soo Park 4, Jonathan Fisher 5, Kevin Biese 6, Jacob Ufberg 7
Editor: Sebastian Uijtdehaage
PMCID: PMC6001733  PMID: 30051053

Abstract

Objectives

No summative speaker evaluation form with validity and reliability evidence currently exists in the English medical education literature specifically to help conference planners make future decisions on speakers. We seek to perform a proof‐of‐concept evaluation of a concise, effective evaluation form to be filled out by audience members to aid conference planners.

Methods

We created the Council of Emergency Medicine Residency Directors (CORD‐EM) form, a novel, three‐question speaker evaluation form for the CORD‐EM national conference and evaluated it for proof of concept. The CORD‐EM form was analyzed with three evaluators and randomized to select only two evaluators’ ratings to make results more generalizable to a generic audience evaluating the speaker.

Results

Forty‐six total evaluations ranged from 6 to 9 (mean ± standard deviation = 8.1 ± 1.2). The form demonstrated excellent internal consistency (Cronbach's alpha = 0.923) with good inter‐rater reliability (intraclass correlation = 0.617) in the conference context.

Conclusions

The CORD‐EM speaker evaluation form is, to our knowledge, the first evaluation form with early reliability and validity evidence specifically designed to help conference planners. Our results suggest that a short speaker evaluation form can be an effective instrument in the toolbox for conference planners.


Speaker evaluations are a central part of medical conferences and a requirement for conferences to offer continuing medical education (CME) credits that are necessary for physician licensure in the United States.1 Additionally, lectures are a central part of clinicians’ continuing education, and lecturer selection determines the experiences of participants.

Currently there are several resources that describe how to give effective presentations.2, 3, 4, 5, 6 However, there are few resources with supportive evidence for medical conference planners to evaluate their speakers.7, 8, 9 The CME literature in recent years has emphasized the need for tangible learning outcomes,10 but the impractical nature of designing, examining validity evidence, and administering pre‐ and posttests to all CME participants for all lectures leaves conference planners with only speaker evaluation forms to help make practical decisions about speakers for the following year. Speaker evaluations also remain a requirement for the CME‐granting conferences.1

The speaker evaluation tools described in the medical literature to date have not been evaluated for their ability to discern presentation quality at CME‐granting conferences. One available tool is a formative feedback system for speakers that emphasizes narrative comments in addition to eight 5‐point Likert‐type items.7 The only other tool in the peer‐reviewed literature to date is that by Wood and colleagues,8 also designed specifically as feedback for presenters. Wood's instrument consists of nine 7‐point Likert‐type items. It was tested in the grand rounds setting for two specialties and demonstrated excellent reliability. However, formative and summative evaluations are different endpoints,11 and these two previously described tools were not designed to differentiate speaker quality for the purposes of choosing the best speakers so that they may be preferentially invited to give presentations at subsequent events. Additionally, national conferences are a distinct setting from grand rounds, despite similar audiences.

In contrast, the general communication field, represented by the National Communication Association, has a well‐established instrument to differentiate speaker quality. However, the instrument is intended for competitions, short speeches, and topics of the speaker's choice, among other limiting factors.12

In summary, no criterion standard evaluation instrument currently exists to guide conference planners when they are tasked with selecting speakers each year. The objective of this pilot study was to develop and evaluate a short, summative speaker evaluation form for use by national CME‐granting conference planners.

Methods

The study occurred at the Council of Residency Directors in Emergency Medicine (CORD‐EM) annual Scientific Assembly in Phoenix, Arizona, in April 2015. CORD‐EM (http://www.cordEM.org) is the society for emergency medicine residency program leadership, primarily in the United States. The study was granted quality improvement exemption by the Temple University Institutional Review Board. Participants received no compensation. No funding was received for this study.

Instrument Development

The CORD‐EM evaluation form was based on the three domains in the Competent Speaker Form created by the National Communication Association: speaker knowledge, skills, and ability to motivate.12 It consisted of three items pertaining to speaker quality (directly based on the three domains) and two items pertaining to bias and topic preference as required by the CORD CME‐granting agency (see Figure 1). It is worth noting that the last domain, ability to motivate, also reflects current trends in CME to consider Kirkpatrick's levels of change,13 but in the context of the speaker. The domain evaluates ability of the speaker to motivate use in practice. The latter two items were not analyzed since they were not directly related to presenter ability. The CORD‐EM form sums the three items to produce a formal Likert scale. The competency items were reviewed by three current and past CORD‐EM conference planners (JF, KB, and JU) and confirmed to be relevant for choosing speakers at future conferences. The wording for each item was discussed at length by the current and past conference planners and piloted among five faculty and two residents with conference experience to ensure clarity and content validity of the survey items.

Figure 1.

Figure 1

Council of Emergency Medicine Residency Directors speaker evaluation form.

The items were measured on a 3‐point Likert‐type measurement anchored by ”below average,” ”average,” and ”above average” compared to other CORD‐EM speakers at the current conference. Evaluations were conducted starting on Day 2 of the conference, which allowed evaluators to see other presenters prior to conducting evaluations. The wording of ”current conference” was intentionally chosen since speaker performance can vary year to year, as can performance at different conferences. Although the ”current conference” wording inevitably creates a lack of comparison for an evaluator's first speaker, the group consensus was that this would nonetheless provide overall better reliability than comparison to other conferences and years.

Instrument Deployment

The CORD‐EM form was specifically designed to be short and distributed to attendees without any formal instruction. For the purposes of this study, three volunteers intermittently evaluated speakers based on the volunteers’ conference schedules over 2 days so that all speakers were evaluated by at least two of the three evaluators in different combinations. The selected speakers were a convenience sample due to the volunteers’ conference schedules.

Validity

The CORD‐EM form was evaluated using Messick's unified validity framework interpretation14, 15 (Table 1). This framework requires researchers to consider five types of validity evidence to build a case for a cohesive validity assessment. In short, it is a formal way to evaluate whether an instrument is accurately measuring an intended concept.

Table 1.

Item Characteristics for the Council of Emergency Medicine Residency Directors Speaker Evaluation Form

Item Item Discrimination Correlation ICC (95% CI)
1 0.891 0.331 (–0.649 to 0.729)
2 0.891 0.247 (–0.857 to 0.694)
3 0.908 0.710 (0.286 to 0.882)
Sum N/A 0.617 (0.055 to 0.844)

ICC = intraclass correlation; N/A = not applicable.

Internal structure validity evidence (internal consistency reliability) was assessed with Cronbach's alpha, and response process validity evidence (inter‐rater reliability) was assessed with intraclass correlation (ICC). Although raters participating in scoring were unbalanced (i.e., some presentations were evaluated by two raters while others were evaluated by three raters), they were randomized in the analyses to optimize calculation of inter‐rater reliability indices with the goal of being able to generalize findings.16, 17

Data Analysis

All data were entered to Excel (Microsoft Corp.) and analyzed in SPSS version 21 (IBM, Inc.). ICC was chosen to account for three raters and provide generalizability to the results. Although the data are ordinal on a Likert‐type scale, there is strong precedent for treating such data as continuous, making the ICC applicable.18, 21

Results

The CORD‐EM tool (possible range 3–9) sum scores for 21 independent speakers ranged from 6 to 9 (mean ± standard deviation [SD] = 8.1 ± 1.2; please see Figure 2 for graphic distributions. Among the 46 total recorded evaluations for the 21 speakers, 17.4% evaluations summed to six, 10.9% summed to seven, 17.4% summed to eight, and 54.3% summed to nine. The distribution of the motivation item ranged from 1 to 3 (mean ± SD = 2.6 ± 0.5); knowing material ranged from 2 to 3 (mean ± SD = 2.8 ± 0.4); conveying how to use the information (skills) ranged from 2 to 3 (mean ± SD = 2.7 ± 0.5).

Figure 2.

Figure 2

Distribution of Council of Emergency Medicine Residency Directors speaker evaluation scale scores by percentage. Potential scale range 3–9. n = 46.

Cronbach's alpha, a measure of internal consistency reliability and internal structure validity evidence, was 0.923. Item discrimination, a measure of how the domains support the construct, ranged from 0.891 and 0.908.

Overall inter‐rater reliability contributing to response process validity evidence for the tool met criteria for ”good” in this evaluation but should be considered a somewhat rough estimate with such a large confidence interval (CI; ICC = 0.617 [95% CI = 0.055–0.844];19 see Table 2 for item‐specific characteristics).

Table 2.

Messick's Unified Theory of Construct Validity14

Content aspect: Does the instrument have relevant and representative questions to the overall theme?
Substantive aspect: Are theoretical rationales and empirical evidence consistent with respondents engaging the questions as intended?
Structural aspect: Do the domains support the overall scoring structure?
Generalizability aspect: How well do the score properties and interpretations generalize to different populations and contexts?
External aspect: Do the instrument's results converge and diverge (convergent and discriminant evidence) with other instruments’ results as theoretically expected?
Consequential aspect: Are the implications of the score appropriate for its intended use, especially in settings with potential bias?

Discussion

Interpretation of Validity Evidence

All aspects of Messick's validity definition were achieved by the CORD‐EM form except the external aspect (“relations to other variables”) since no comparisons were available. Based on an extensive literature review and expert review, the content aspect was fulfilled. The scale's SD was 1.2, representing just over 10% of the 9‐point scale, rendering the substantive aspect acceptable by our interpretation. The internal structure validity evidence was sound, supported by excellent item discrimination ranging between 0.891 and 0.908 and a Cronbach's alpha of 0.923. These findings suggest that the domains were able to discriminate between each other under the same cohesive construct of a high‐quality speaker. In other words, the three components each successfully measured a different aspect of the concept of a good speaker. The consequential aspect was confirmed with use of the scale at the following CORD‐EM meeting and score evaluations to invite that year's speakers. As with any instrument, bias such as from sex, age, or race can be a potential consequence, but would be for later evaluation outside the scope of this initial evaluation. It is difficult to evaluate the generalizability aspect without applications of our instrument to other populations and other contexts. Similarly, we were unable to evaluate the external aspect.

Summary

The novel CORD‐EM speaker evaluation form demonstrated early validity evidence both by theoretical and empirical evidence. The CORD‐EM form further showed excellent internal consistency and good inter‐rater reliability, in addition to being a practical, short form that requires no evaluator training.

Comparisons to Prior Literature

Our results are consistent with the assertion that knowledge, skill, and ability to motivate are fundamental features of a ”good speaker.”12 Prior literature supports this since knowledge and enthusiasm (an arguably similar concept to motivation) are components of the previously evaluated tool by Wood et al.8

The Wood et al.8 and Wittich et al.7 tools measure similar concepts to the CORD‐EM tool, but the CORD‐EM tool has validity evidence specifically for summative evaluation use by conference planners. Our study is also the first to evaluate a speaker evaluation tool at—and explicitly for—a national medical conference. Furthermore, its short format, approximately one‐third the length of the other forms, may improve audience response rates.20

Limitations And Future Work

Our study faces several limitations as a pilot study. First, it was tested at a single meeting for medical education specialists, which may have artificially improved the reliability of the CORD‐EM tool compared to evaluators and speakers who are not educators. We also had a relatively small number of evaluators and speakers among a convenience sample of each, due to time and logistic restraints at the conference, and among the small EM community, the evaluators and speakers may have known each other.

The intent of the shorter CORD‐EM form was to improve audience response rates, a problem that plagues medical conferences.22, 23 Psychometric results support the validity evidence of the form, and the next step is to study whether the shortened form significantly improves response rates.

Conclusions

The Council of Emergency Medicine Residency Directors speaker evaluation form, with its simplicity and early validity evidence as a tool for conference planners, has potential to better inform which speakers at a medical conference are the ”best” speakers and should be invited to speak again. Further studies of this short form both in other specialty conferences and to determine whether response rates are superior with this short form are needed.

The authors thank Drs. Megan Fix, Damon Kuehl, Janis Tupesis, and Moshe Weizberg for their contributions to the reliability studies as evaluators.

AEM Education and Training 2017;1:340–345.

Presented at Council of Emergency Medicine Residency Directors Academic Assembly, Nashville, TN, March 6–9, 2016.

The authors have no relevant financial information or potential conflicts to disclose.

References

  • 1. Accreditation Council for Continuing Medical Education . The Accreditation Requirements and Descriptions of the Accreditation Council for Continuing Medical Education (ACCME). 2014. Available at: http://www.accme.org/sites/default/files/626_20140626_Accreditation_Requirements_Document_0.pdf. Accessed Aug 31,2017.
  • 2. Garity J. Creating a professional presentation. A template of success. J Intraven Nurs 1999;22:81–6. [PubMed] [Google Scholar]
  • 3. Gelula MH. Effective lecture presentation skills. Surg Neurol 1997;47:201–4. [DOI] [PubMed] [Google Scholar]
  • 4. Brown G, Manogue M. AMEE Medical Education Guide No. 22: Refreshing Lecturing: a Guide for Lecturers. Med Teach 2001;23(3):231–244. [DOI] [PubMed] [Google Scholar]
  • 5. Copeland HL, Hewson MG, Stoller JK, Longworth DL. Making the continuing medical education lecture effective. J Contin Educ Health Prof 2005;18:227–34. [Google Scholar]
  • 6. Copeland HL, Longworth DL, Hewson MG, Stoller JK. Successful lecturing: a prospective study to validate attributes of the effective medical lecture. J Gen Intern Med 2000;15:366–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wittich CM, Mauck KF, Mandrekar JN, et al. Improving participant feedback to continuing medical education presenters in internal medicine: a mixed‐methods study. J Gen Intern Med 2012;27:425–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Wood TJ, Marks M, Jabbour M. The development of a participant questionnaire to assess continuing medical education presentations. Med Educ 2005;39:568–72. [DOI] [PubMed] [Google Scholar]
  • 9. Cunningham L. Is Your Conference Evaluation Form Yielding Meaningful Results? Survey Systems. 2013. Available at: http://info.sur-sys.com/blog/bid/345632/Is-your-Conference-Evaluation-Form-Yielding-Meaningful-Results. Accessed Sep 13, 2017. [Google Scholar]
  • 10. Committee on Planning a Continuing Health Professional Education Institute, Institute of Medicine . Redesigning Continuing Education in the Health Professions. Available at: http://nap.edu/12704. Accessed Sep 13, 2017.
  • 11. Ende J. Feedback in clinical medical education. JAMA 1983;250:777. [PubMed] [Google Scholar]
  • 12. Morreale S. ” The Competent Speaker”: Development of a Communication‐Competency Based Speech Evaluation Form and Manual, 2nd ed Washington, DC: National Communication Association, 2007. [Google Scholar]
  • 13. Kirkpatrick DL. Techniques for evaluating training programs. J ASTD 1959;75:3–9. [Google Scholar]
  • 14. Messick S. Meaning and values in test validation: The science and ethics of assessment. Educational Researcher 1989;18(2):5–11. [Google Scholar]
  • 15. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. Am J Med 2006;119(166):e7–16. [DOI] [PubMed] [Google Scholar]
  • 16. DeMars C. Estimating variance components from sparse data matrices in large‐scale educational assessments. Appl Meas Educ 2015;28:1–13. [Google Scholar]
  • 17. Kreiter CD, Ferguson K, Lee WC, Brennan RL. A generalizability study of a new standardized rating form used to evaluate students’ clinical clerkship performances. Acad Med 1998;73:1294. [DOI] [PubMed] [Google Scholar]
  • 18. Norman G. Likert scales, levels of measurement and the ”laws” of statistics. Adv Health Sci Educ 2010;15:625–32. [DOI] [PubMed] [Google Scholar]
  • 19. Cicchetti DV, Sparrow SA. Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. Am J Mental Defic 1981;86:127–37. [PubMed] [Google Scholar]
  • 20. Phillips AW, Reddy S, Durning SJ. AMEE Guide No. 102. Response rates and nonresponse bias in medical education research. Med Teach 2016;38:217–28. [DOI] [PubMed] [Google Scholar]
  • 21. Hallgren KA. Computing inter‐rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol 2012;8:23–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Vijayashankar MR. Evaluation of speakers at CME: Cosmecon 2006, an international conference on ageing and anti‐ageing. J Cutan Aesthet Surg 2008;1:98–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Collins J, Mullan BF, Holbert JM. Evaluation of speakers at a national radiology continuing medical education course. Med Educ Online 2002;7:4540. [DOI] [PubMed] [Google Scholar]

Articles from AEM Education and Training are provided here courtesy of Wiley

RESOURCES