Abstract
The American Board of Family Medicine (ABFM) has used a 60-item Multiple Choice Question (MCQ) section followed by a Virtual Patient (VP) exercise in Maintenance Of Certification (MOC) since 2004, and has had an asthma module since 2005. The original asthma VP criteria anticipated some Expert Panel Report-3 recommendations, such as home peak flow monitoring and a written plan, that were added to the MCQ section only when the guideline was updated in 2007. VP completion rates for these criteria improved markedly with the MCQ update, while other criteria completion rates were stable. Asthma criteria completion rates are not predicted by the strength of evidence for the criteria. User interface details influence criteria completion rates, but did not affect the changes observed in 2007. Asthma MCQ content affects Diplomate performance on asthma VP: this translational step suggests that MOC exercises could result in improved care for real patients.
BACKGROUND
Since 2004, Diplomates (family physicians certified by the ABFM) have been required to complete between 5 and 7 self-assessment Maintenance of Certification (MOC) exercises between recertification examinations, and can recertify on a 10-year schedule rather than a 7-year schedule if they complete at least 2 MOC exercises every 3 years. Each MOC module focuses on a disease process, such as asthma or diabetes 1,2. An MOC exercise comprises a 60 Multiple Choice Question (MCQ) section and a Virtual Patient (VP) management problem (figure 1). Both sections are open book. Diplomates must answer all 60 items before they can review them with linked critiques and references, and must correctly answer 80% (48) before proceeding to the VP. Thus, the MCQ section allows the ABFM to direct Diplomates’ attention to specific content, the VP section requires Diplomates to rehearse this content, and the whole MOC exercise could encourage Diplomates to apply content in real practice.
Figure 1.
ClinSim opening screen
The ABFM currently uses VP models of 13 topics in MOC. These models generate VP that have one obvious health issue and are truthful, adherent, and responsive to all reasonable treatments. None of the VP present with treatment underway. Therefore, Diplomates’ actions with these VP reflect their general concern about interactions with the ABFM, their first response to the health problem portrayed, and their facility with the user interface, called ClinSim. Diplomates often are anxious about new ABFM activities, and may invest considerable time trying to anticipate all useful queries and interventions in an open-ended series of encounters with a VP. Most queries and prescriptions were easy to locate in ClinSim, but some have taken time to arrange so that Diplomates could find them consistently.
Diplomates must complete half of the scoring criteria specified for a VP to pass the MOC module. The criteria are widely accepted and typically relate to content reviewed in the immediately preceding MCQ. Thus, VP criteria completion could predict Diplomates’ ability to recall and apply evaluation and management principles to relatively simple patients in real practice, with the limitation that ClinSim is a potential barrier to demonstrating that ability.
The original asthma VP, published in 2005, anticipated 3 criteria that were added to the MCQ section in 2007, when the Expert Panel Report-3 endorsed home peak flow monitoring, written action plans, and influenza vaccination3. The report also described the strength of evidence for each recommendation. We review performance on VP generally, and on asthma VP in detail, with attention to the relationship between MCQ content and VP criteria completion rates.
METHODS
We reviewed ABFM records of VP cases and, for the asthma topic, criteria completion as of December 31, 2010. We calculated 1st attempt pass rates for all simulations.
For the asthma simulation, we calculated individual criterion completion rates for all simulations completed in each year from 2005 to 2010, and over the entire 6-year period. For each criterion, we correlated the strength of evidence as assessed by the National Asthma Education and Prevention Program Expert Panel Report-3 (EPR-3) or the ABFM with completion rates. EPR-3 classified strength of evidence using the following lettered evidence categories:
Multiple randomized controlled trials (RCTs) that generate a rich and consistent body of data.
RCTs that comprise a more limited body of data, for instance when there are few RCTs; the RCTs are small in size, the RCT population is not representative, or the results are inconsistent.
Nonrandomized trials and observational studies.
Panel consensus judgment.
The ABFM uses the Strength Of Recommendation Taxonomy (SORT), an evidence grading system using familiar letter grades, agreed upon by the editors of the major family medicine journals in 2004 4,5. SORT A indicates high quality evidence from consistent, clinically meaningful, patient oriented outcomes in well-controlled trials. SORT B indicates inconsistent or limited quality evidence. SORT C indicates low quality evidence, for instance from consensus, tradition, or disease-oriented studies.
We correlated the strength of recommendation for each criterion with the completion rate using the non-parametric Wilcoxon/Kruskal-Wallis rank sums test (JMP 5.0.1.2). Finally, we plotted the criteria completion rate over the 6-year period to observe temporal trends. Process control charts (IR) were plotted to find changes in completion rates.
RESULTS
The Year column in table 1 indicates the year that the module was introduced. The number of Diplomates who have successfully completed the module appears in the next column. The 1st Pass column shows the percentage of Diplomates who completed at least 50% of the criteria on the first attempt, prior to viewing the complete list of criteria.
Table 1.
Summary of virtual patient pass rates.
| Module | Year | # Diplomates | % Passing on 1st Try |
|---|---|---|---|
| Diabetes | 2004 | 27,435 | 96.56 |
| Hypertension | 2004 | 25,765 | 98.99 |
| Asthma | 2005 | 20,577 | 81.42 |
| Coronary Artery Disease | 2005 | 16,522 | 92.57 |
| Depression | 2006 | 12,935 | 98.76 |
| Heart Failure | 2006 | 8,078 | 88.75 |
| Well Child Care | 2007 | 6,733 | 88.34 |
| Pain Management | 2007 | 8,361 | 80.60 |
| Health Behavior | 2008 | 5,388 | 82.31 |
| Maternity Care | 2008 | 3,502 | 92.03 |
| Care of Vulnerable Elders | 2009 | 3,538 | 61.57 |
| Childhood Illness | 2009 | 4,147 | 75.26 |
| Cerebrovascular Disease | 2010 | 2,941 | 51.75 |
More than 90% of Diplomates pass the diabetes, hypertension, coronary artery disease, and depression simulations on their first attempt. Less than 85% pass the asthma, pain management, and health behavior simulations. The most recent simulations have substantially lower 1st attempt pass rates.
The asthma simulation uses 19 criteria listed in table 2. The SOAP column indicates whether the criterion is related to Subjective data, Objective data, Assessment (no criteria), or Plan (i.e. management). The Criterion name describes a collection of suitable actions. For instance, either inhaled steroids or leukotriene inhibitors can fulfill the “long-term asthma control medication” criterion. ABFM indicates the SORT grade assigned by the ABFM to the criterion. EPR-3 indicates the evidence grade assigned in EPR-3. Some concepts that were introduced in EPR-2 were carried into EPR-3 but did not receive evidence grades in either report. The completion rate indicates the percentage of 25,498 simulations in which the criterion was satisfied.
Table 2.
Asthma criteria completion rates from 2005 through 2010
| SOAP | Criterion name | ABFM | EPR-3 | Completion (%) |
|---|---|---|---|---|
| S | Allergies and Occupational Exposure | C | B | 90 |
| S | Dyspnea on Exertion | C | B | 64 |
| S | Chest Tightness | C | B | 50 |
| S | Cough | C | B | 79 |
| S | Dyspnea | C | B | 68 |
| S | Wheezing | C | B | 73 |
| S | Environmental Tobacco Smoke | B | C | 32 |
| S | Family History of Asthma | C | NR | 85 |
| S | Frequency of Short Acting Beta-Agonist Use | C | B | 52 |
| S | Nocturnal Cough | C | B | 51 |
| S | Smoking History | A | C | 74 |
| S | Sputum | C | NR | 32 |
| O | Chest Examination | C | NR | 97 |
| O | Nose Examination | C | NR | 77 |
| P | Home monitoring (peak flow log) | C | A | 57 |
| P | Influenza Vaccination | B | A | 61 |
| P | Long Term Asthma Control Medication | A | A | 68 |
| P | Short-Acting Beta-Agonist (SABA) | A | A | 93 |
| P | Written Action Plan | A | B | 61 |
Strength of evidence supporting criteria was not significantly correlated with completion of criteria overall (ABFM p=0.15; EPR-3 p=0.43) or in 2010 (ABFM p=0.33; EPR-3 p=0.096). Although long and short acting beta-adrenergic agonists were prescribed frequently, influenza vaccination and written action plans were not used as frequently in spite of similarly strong supporting evidence.
Figure 2 illustrates changes in completion rates for symptom-related criteria from 2005 to 2010. The increases in completion of queries about allergies, exertional dyspnea, and family history of asthma generated process control signals (2005 rates were below the lower control limit for the series; 2006 rates were within limits). These signals may reflect changes in the user interface or criteria definitions. The remaining symptom completion rates are stable over time.
Figure 2.
Subjective asthma criteria completion rates
Figure 3 illustrates changes in objective queries and plans over the same time period. Physical examination maneuvers require minimal searching in ClinSim, and have high completion rates.
Figure 3.
Objective and management asthma criteria completion rates
Home monitoring (peak flow), influenza vaccination, and written plans all increased markedly from 2006 to 2007, coincident with the addition of related content in the MCQ section. For home monitoring and written plans, the rate in 2005 and 2006 rates are below the lower control limits for the series, and the moving ranges are above the upper control limit in 2007. The influenza vaccine plot shows 5 rising points, just short of a signal; the largest moving range change occurred in 2007, but was within control limits.
The last two SABA rates are above the upper control limit for the series, and the moving range is above the upper control limit in 2008 (Figure 4). This small but significant rise is not associated with any deliberate effort to promote SABA. The MCQ section contained a number of items regarding the use of SABA during asthma exacerbations, but the VP was not designed to portray an exacerbation.
Figure 4.
SABA process control charts
The last control medication and nasal exam rates are below the lower control limits for these series. The control medication change may complement the increased SABA rate, but is otherwise unexplained. The nose exam rate change in 2010, confirmed by a moving range signal, is related to a change in criterion definition.
DISCUSSION
We believe that recent criterion completion rates primarily reflect whether users thought to take an action, rather than user interface barriers. The virtual patient user interface requires the user to type a key word in any historical question they wish to ask. This is a well-understood and reliable interface, so that omissions in historical information generally mean that users did not think to ask the question during the simulation. Similarly, vaccines and medications are listed in fairly intuitive and readily accessed interfaces that are consistent across all simulations. Again, omission is likely to indicate oversight. The home peak flow monitoring, influenza vaccination, and written action plans also are found consistently (figures 5 and 6), so that omission is likely to reflect oversight in the current ClinSim interface.
Figure 5.
Ordering an influenza vaccine
Figure 6.
Ordering home peak flow monitoring
The ClinSim interface must rapidly locate the queries and treatments that Diplomates seek if the ABFM is to use VP in periodic examinations. Periodic examinations are time limited, closed book, and the stakes are much higher. The improvement in criteria completion suggests progress toward this goal. However, the low 1st attempt pass rate for the new simulations may indicate ongoing issues with VP models, criteria definitions, or ClinSim interface.
The 2007 improvements in completion of important asthma management criteria occurred without interface or definition changes. Two explanations are possible: 1) physicians could be giving answers that reflect changes in their clinical behavior, distinct from MOC, or 2) physicians could be learning to complete VP criteria from preceding MOC activities. It would be pleasantly surprising if the 2007 EPR-3 guideline were disseminated so rapidly and effectively that it changed physician behavior with real patients, and the change in VP criteria completion simply reflected a shift in current practice. Given the great difficulties involved in guideline dissemination, this strikes us as unlikely. The second explanation seems more likely: that the multiple-choice segment of the MOC exercise cued actions with the VP, or that physicians learned from colleagues to anticipate these criteria. This would explain the increased use of home monitoring and written action plans, and may contribute to the delayed increase in SABA use. If any of these explain the improvement, then physicians are rehearsing new behaviors to complete the VP criteria. If this pattern is replicated with other content areas, and predicts similar changes in real patient care, MOC could become an important new tool for disseminating guideline content.
The low rate of completion of some asthma symptom questions could reflect a value of information problem: nearly all Diplomates inquire about at least one asthma symptom, and most inquire about 2 or more. If the symptom questions were taken together, performance would be quite high. However, the value of these questions depends heavily on the responses given by the VP. If the Diplomate could classify the VP as having severe persistent asthma on the basis of questions already asked, the remainder of the symptom survey would become irrelevant. On the other hand, if the VP portrays a milder stage of disease, such as moderate persistent asthma, then there is some information to be gleaned from each symptom question – any answer could raise the stage and intensify the treatment strategy, and therefore each answer resolves some ambiguity, whether the VP endorses or denies the symptom. Questions not asked could be viewed as a sign of premature closure. Nevertheless, each response that does not indicate a more serious stage of disease makes stage escalation with further questioning less likely. Also, some Diplomates might assume that the response to one symptom question encompasses another: If the patient denies dyspnea, a Diplomate might guess that there is no dyspnea on exertion, either. These are important considerations in the development of more refined scoring procedures.
CONCLUSION
Physicians managing virtual patients have improved their criteria completion rates over time, due in part to user interface enhancements, but especially in response to changes in preceding multiple choice question content. The exercise may result in at least one rehearsal of unfamiliar management techniques. These observations suggest that MOC could change physician behavior and outcomes in real patient encounters.
References
- 1.Hagen MD, Ivins DJ, Puffer JC, et al. Maintenance of Certification for Family Physicians (MC-FP) Self Assessment Modules (SAMs): The First Year. J Am Board Fam Med. 2006 Jul-Aug;19(4):398–403. doi: 10.3122/jabfm.19.4.398. [DOI] [PubMed] [Google Scholar]
- 2.Hagen MD, Sumner W, Roussel G, Rovinelli R, Xu J. Computer-based testing in family practice certification and recertification. J Am Board Fam Pract. 2003 May-Jun;16(3):227–232. doi: 10.3122/jabfm.16.3.227. [DOI] [PubMed] [Google Scholar]
- 3.National Asthma Education and Prevention Program Expert Panel Report 3 (EPR-3): Guidelines for the Diagnosis and Management of Asthma-Summary Report 2007. J Allergy Clin Immunol. 2007 Nov;120(5 Suppl):S94–138. doi: 10.1016/j.jaci.2007.09.043. [DOI] [PubMed] [Google Scholar]
- 4.Ebell MH, Siwek J, Weiss BD, et al. Simplifying the language of evidence to improve patient care: Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in medical literature. J Fam Pract. 2004 Feb;53(2):111–120. [PubMed] [Google Scholar]
- 5.Ebell MH, Siwek J, Weiss BD, et al. Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in the medical literature. J Am Board Fam Pract. 2004 Jan-Feb;17(1):59–67. doi: 10.3122/jabfm.17.1.59. [DOI] [PubMed] [Google Scholar]






