Abstract
Clinicians from different care settings can distort the problem list from conveying a patient’s actual health status, affecting quality and patient safety. To measure this effect, a reference standard was built to derive a problem-list based model. Real-world problem lists were used to derive an ideal categorization cutoff score. The model was tested against patient records to categorize problem lists as either having longitudinal inconsistencies or not. The model was able to successfully categorize these events with ~87% accuracy, ~83% sensitivity, and ~89% specificity. This new model can be used to quantify intervention effects, can be reported in problem list studies, and can be used to measure problem list changes based on policy, workflow, or system changes.
ATTENDEE TAKE-AWAY
This presentation will help attendees better understand the competing interests caregivers have, how this can negatively affect patient record quality, and how that effect can be measured using data science techniques.
INTRODUCTION
Clinician under-utilization of the electronic problem list is a long-recognized issue, with many potential causes, such as the time burden required to maintain such records1,2, the lack of financial incentive3,4, and a lack of personal benefit for maintaining the record5-10. Many estimates put the problem list’s accuracy at only 50%11-13. Such low-quality medical records have the possibility of leading to patient harm14, which is why it is crucial to understand the root causes of poor problem-list usage in more detail. These studies have proposed some reasons why the problem list is inaccurate and underutilized. Our prior research15 suggests that competing clinician interests may also be contributing to longitudinal inconsistencies. Literature searches provided neither prevalence of this type of issue nor of commonly reported metrics for this type of scenario.
Typical problem list metrics include the average number of problems on a patient’s problem list, the time the clinician takes entering or maintaining the list, completeness of the problem list, how clinicians feel about problem list usage16,17, and how often a clinical decision support system (CDSS) alert was accepted or ignored based loosely on problem list data18. None of these measure the issue of unjustified modifications to the problem list caused by competing clinician interests. A new model was developed to measure what we refer to as Tension, or a loss of fidelity on a patient’s problem list due solely to caregiver preferences and not because of medical necessity. An example of Tension is when a clinician marks a problem as “inactive” simply to remove it from the problem list because it is not perceived as relevant to the clinician’s current encounter with the patient.”
MATERIALS AND METHODS
To develop a model to measure Tension, the methodology that is illustrated by Kirshner19 and van Walraven20,21 was followed and validated using insights from Kelly22 (see Figure 1). The methodology includes establishing validity, building a reference standard, building a scoring matrix, building and training a model, and testing the model.
Figure 1.

Research Design Process
Establishing Validity
The first step was to achieve face validity: do a group of experts understand the issue being described, do they agree that it is an issue, and do they agree with the plan to measure it? A group of 13 clinicians from Intermountain Healthcare (IH), the Salt Lake City Veterans Affairs Hospital, and the University of Utah Healthcare system met to discuss sociotechnical clinical information system issues (Figure 1, box A). The scenario was explained, and they agreed it was an issue they have observed but were uncertain how to measure it. As a result, this group decided to build a survey to send to other clinicians.
The survey aimed to identify scenarios that would need to arise in a problem list before clinicians felt it exhibited Tension. Each of the following problem types available in the legacy IH electronic medical record (EMR) system was listed. The IH data warehouse uses a single longitudinal problem list per patient and does not have a separate inpatient or billing problem list. (Figure 1, box B).
Two problem states were defined: T1 represents the state of the problem at the previous encounter with the outpatient clinician; T2 represents the state of that problem at the next encounter with an outpatient clinician (Figure 2). Responses from a RedCap survey23 (Figure 1, box C) scored how likely the problem state change indicated Tension, using a Likert scale of 1-7.
Figure 2.
Example problem lists at initial (T1) and subsequent (T2) encounters. Status changes are highlighted in T2 and may reflect Tension. Clinicians used these side-by-side lists to identify the presence or absence of Tension.
Build a Reference Standard
The survey results were used to determine which patterns the clinicians felt were most likely to result in Tension. Based on the survey results, 201 problem lists in the IH electronic data warehouse (EDW) with these patterns of Tension (Table 2) were identified (Figure 1, box D). Those problem lists were scored by a new group of three clinicians who identified lists with Tension (Figure 1, box E) (0 = no Tension; 1 = Tension) (Figure 2).
Table 2.
Count of discrete problem attributes scored by clinicians to denote Tension.
| All 3 clinicians agree the list was in Tension. | ||||
| Chronicity | Problem type | T1 Status | T2 status | Frequency |
| Chronic | Diagnosis | Active | Error | 20 |
| Acute | Diagnosis | Active | Error | 4 |
| 2 clinicians agree the list was in Tension. | ||||
| Chronicity | Problem type | T1 Status | T2 status | Frequency |
| Acute | Diagnosis | Active | Error | 14 |
| Chronic | Diagnosis | Active | Resolved | 11 |
| Chronic | Diagnosis | Active | Inactive | 4 |
| Chronic | Diagnosis | Active | Error | 4 |
| Acute | Diagnosis | Resolved | Active | 2 |
| A single clinician thinks the list was in Tension. | ||||
| Chronicity | Problem type | T1 Status | T2 status | Frequency |
| Acute | Diagnosis | Active | Resolved | 10 |
| Chronic | Diagnosis | Resolved | Active | 9 |
| Acute | Diagnosis | Resolved | Active | 7 |
| Chronic | Diagnosis | Active | Inactive | 6 |
| Chronic | Diagnosis | Active | Error | 6 |
| Acute | Diagnosis | Active | Inactive | 4 |
Sample size calculation
Reliability is demonstrated with test characteristics (sensitivity, specificity, the area under the curve, and receiver operating characteristic curve), reported with 95% confidence intervals. The sample size was calculated based on precision, that is, a sample large enough to provide a satisfactorily narrow confidence interval24,25 to predict the needed dataset to hone and test our model.
Using the precision method for calculating sample size the standard 5% was used as the margin of error, and 49,813 as the patient population (discussed in “Inclusion and Exclusion,” below). Since it was uncertain how the scoring clinicians would score the problem lists, a conservative response distribution of 50% was used, which lowered the confidence level to 85% and resulted in a sample size of 201.
The sample-set was split into two-thirds for a training set (n = 134) and one-third for a test set (n = 67) using a proportional stratified sample of one-third without Tension, one-third with questionable Tension, and one-third with obvious Tension. It was anticipated that the expert chart reviewer would identify one-half of these with Tension, so n = 33 Tension on the reference standard and n = 34 without Tension on the reference standard. It was assumed that both the sensitivity and specificity of 70% could be achieved (23/33 = 70%). A Wilson 95% confidence interval would then be (53%, 83%). This is considered narrow enough for the sensitivity and specificity estimates to be sufficiently precise.
Inclusion and exclusion criteria
The data for the reference standard came from historical IH problem list data in the EDW. To be included, a patient had to have at least five problems on the problem list, as well as two encounters: the first in an outpatient setting, the second in an inpatient or ED setting. The problem list also needed at least one of the Tension scenarios identified in the final scoring scheme. The 23 survey respondents gave insight into what the polled clinicians felt would indicate Tension (Table 1).
Table 1.
Survey Results from clinicians, sorted by the mean value. Scores in the 1-7 columns are the percentage of total respondents who made that selection.
| Type | T1 | T2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Mean | Mode | Skewness |
| Diagnosis | Active | Error | 9.09 | 0 | 9.09 | 9.09 | 13.63 | 22.72 | 36.36 | 5.318 | 7 | -1.052 |
| Diagnosis | Active | Resolved | 0 | 9.09 | 4.54 | 13.63 | 27.27 | 9.09 | 36.36 | 5.318 | 7 | -0.591 |
| Diagnosis | Active | Inactive | 4.54 | 18.18 | 4.54 | 36.36 | 13.63 | 4.54 | 18.18 | 4.227 | 4 | 0.104 |
| Soc./Fam Hx | Active | Resolved | 0 | 22.72 | 22.72 | 18.18 | 9.09 | 18.18 | 9.09 | 4.054 | 4 | 0.341 |
| Soc./Fam Hx | Active | Error | 0 | 22.72 | 18.18 | 22.72 | 9.09 | 13.63 | 13.63 | 4.033 | 4 | 0.339 |
| Soc./Fam Hx | Active | Inactive | 4.54 | 27.27 | 18.18 | 13.63 | 9.09 | 18.18 | 9.09 | 3.863 | 2 | 0.293 |
| Diagnosis | Inactive | Error | 4.54 | 18.18 | 27.27 | 22.72 | 13.63 | 0 | 13.63 | 3.772 | 3 | 0.608 |
| Diagnosis | Error | Active | 18.18 | 13.63 | 22.72 | 18.18 | 9.09 | 9.09 | 9.09 | 3.5 | 3 | 0.364 |
| Diagnosis | Inactive | Resolved | 4.54 | 31.81 | 22.72 | 18.18 | 13.63 | 0 | 9.09 | 3.409 | 2 | 0.816 |
| Soc./Fam Hx | Error | Active | 13.63 | 27.27 | 22.72 | 13.63 | 13.63 | 4.54 | 4.54 | 3.181 | 2 | 0.612 |
| Diagnosis | Resolved | Error | 13.63 | 45.45 | 9.09 | 13.63 | 4.54 | 4.54 | 9.09 | 3 | 2 | 1.057 |
| Diagnosis | Inactive | Active | 13.63 | 27.27 | 27.27 | 22.72 | 9.09 | 0 | 0 | 2.863 | 3 | 0.098 |
| Soc./Fam Hx | Inactive | Active | 13.63 | 36.36 | 18.18 | 22.72 | 4.54 | 0 | 4.54 | 2.863 | 2 | 0.997 |
| Soc./Fam Hx | Resolved | Error | 4.54 | 54.54 | 18.18 | 4.54 | 13.63 | 0 | 4.54 | 2.863 | 2 | 1.375 |
| Soc./Fam Hx | Inactive | Resolved | 13.63 | 36.36 | 22.72 | 18.18 | 4.54 | 0 | 4.54 | 2.818 | 2 | 1.116 |
| Soc./Fam Hx | Resolved | Active | 22.72 | 31.81 | 13.63 | 13.63 | 13.63 | 0 | 4.54 | 2.818 | 2 | 0.815 |
| Diagnosis | Resolved | Active | 22.72 | 31.81 | 13.63 | 18.18 | 9.09 | 0 | 4.54 | 2.772 | 2 | 0.877 |
| Soc./Fam Hx | Inactive | Error | 9.09 | 50 | 13.63 | 18.18 | 4.54 | 0 | 4.54 | 2.772 | 2 | 1.347 |
| Soc./Fam Hx | Error | Inactive | 18.18 | 45.45 | 13.63 | 4.54 | 13.63 | 0 | 4.54 | 2.681 | 2 | 1.2 |
| Diagnosis | Error | Inactive | 27.27 | 31.81 | 22.72 | 4.54 | 4.54 | 4.54 | 4.54 | 2.59 | 2 | 1.263 |
| Soc./Fam Hx | Resolved | Inactive | 22.72 | 50 | 9.09 | 4.54 | 9.09 | 0 | 4.54 | 2.454 | 2 | 1.544 |
| Soc./Fam Hx | Error | Resolved | 22.72 | 50 | 9.09 | 4.54 | 9.09 | 0 | 4.54 | 2.454 | 2 | 1.544 |
| Diagnosis | Error | Resolved | 40.9 | 31.81 | 9.09 | 4.54 | 4.54 | 4.54 | 4.54 | 2.318 | 1 | 1.414 |
| Diagnosis | Resolved | Inactive | 36.36 | 36.36 | 13.63 | 4.54 | 4.54 | 0 | 4.54 | 2.227 | 1 | 1.722 |
Using the scored patterns, a proportional stratified sample was built from the population of 49,813 patient problem lists. This was done by identifying relevant problem list characteristics (chronicity, problem type, problem status in both the encounters), and then stratifying the pool into three non-overlapping subsets based on whether the problem list showed no Tension (low ranked patterns), questionable Tension (medium ranked patterns), or obvious Tension (top-ranked patterns). Samples were randomly pulled from each of the strata until there was a pool of 201 patient problem lists.
Internal Consistency and Construct Validity Testing
Scoring 201 problem lists is a time-consuming task for a clinician. There was concern that the latter scored lists may be less reliably scored than those at the start due to fatigue. To control for this, each clinician received their 201 lists in random order. Eight duplicate problem lists were introduced that served as a repeated measure, to determine whether the clinician would score these consistently, acting as their own control. Using the known group method, three problem lists that had no status changes were also added, which should, therefore, receive a score of 0. This was to test whether the clinician had correctly understood the instructions for the task, to strengthen the construct validity. These additional control lists brought the total problem lists to 212; only the original 201 lists were later scored for patterns.
Build and Hone the Model
The clinician’s task was to mark each list with a “1” if they concluded the problem list might have Tension, or a “0” if it did not. After reading each clinician the same introductory paragraph describing Tension, we let them decide on their own how that concept would manifest itself in these lists. Clinicians were blinded to each other’s responses. The results are presented in the Discussion section below.
Two-thirds of the reference standard data was taken at random (Figure 1, box F). Of the problem lists that received a score of “1”, the problem type, chronicity, and status change combinations were recorded. Those patterns were subdivided into 3 groups: (1) all 3 clinicians thought the list was in Tension; (2) 2 clinicians indicated Tension; and (3) a single clinician thought the list was in Tension (Table 2) (Figure 1, box G).
To calculate the cutoff score a ROC was built (Figure 1, box H). The ROC identifies the threshold value that maximizes both the sensitivity and specificity simultaneously; this allows quantification of the accuracy of the model’s ability to discriminate between a list in Tension or not in Tension (Figure 1, box I).
After the model was trained on 2/3 of the reference standard data and an ideal cutoff score was determined, the testing phase began. The scoring matrix and cutoff score were applied to the final 1/3 of the reference standard lists (Figure 1, box J), classifying each problem list as being in Tension or not (Figure 1, box K). The results from the new model were compared to what the clinicians built in the reference standard to see how well the model was able to classify a problem list as having Tension. Those results were quantified using sensitivity and specificity (Figure 1, box L) and graphed to see how well the model was able to classify a list as being in Tension or not (Figure 1, box M).
RESULTS
Establish Validity
Forty-five outpatient clinicians from IH were chosen to take the survey due to their high level of interest in problem lists; 23 responded, with a response rate of 51%. The scores from the survey were tallied and ordered according to the mean, mode, and skewness in Table 1. Only five combinations had a median Likert value of 4 or higher.
Build a Reference Standard
The example in Figure 2 shows a list with definite Tension, as scored by the clinician experts. For all 201 scored problem lists, a tally was kept for those lists that received a “1”, to determine which lists were marked as having Tension by all three clinicians (n = 32), by two of the clinicians (n = 45), and by just one clinician (n = 47). There were 77 problem lists that none of the clinicians felt exhibited Tension, which is close to the original estimate of 67 used when calculating sample size.
Reliability
Inter-rater reliability was calculated on the n = 201 problem lists, where each of the three raters scored all 201 lists. The reliability was moderate [kappa = 63%, 95% CI (53%, 71%)]. Intra-rater reliability was assessed by interspersing two copies of an additional eight problem lists of varying complexity into the stack of 201 problem lists, so that each rater scored them twice, making it unlikely that they would remember the score they assigned the first list. The intra-rater reliability was 100%.
Build a scoring matrix
Two-thirds (134) of the 201 scored lists were taken at random to hone the model. An analysis was performed on the problem lists that received a score of “1”. Those results were tallied as shown in Table 2.
A common approach to predictive modeling is to let a regression equation provide weights for the predictors in the model, where the weights are the regression coefficients, or derived from the regression coefficients. In this situation, there were thousands of possibilities in the predictors (multiple diagnoses, changing the status of diagnoses across time). This was accounted for by using a scheme based on clinical importance or content knowledge, using a power function for each pattern weighted by frequency (Table 3).
Table 3.
Scoring matrix derived from the most frequently scored pattern.
| Chronicity | Type | T1 status | T2 status | Frequency | Clinician agreement | Power score | |
| Chronic | Diagnosis | Active | Error | 30 | 3 | 16 (24) | |
| Acute | Diagnosis | Active | Error | 18 | 2 | 8 (23) | |
| Chronic | Diagnosis | Active | Resolved | 11 | 2 | 4 (22) | |
| Acute | Diagnosis | Active | Resolved | 10 | 1 | 2 (21) | |
| Chronic | Diagnosis | Resolved | Active | 9 | 1 | 1 (20) | |
| Acute | Diagnosis | Resolved | Active | 9 | 1 | 1 (20) | |
| No change/add only | 0 | ||||||
Cutoff score
Following the standard Receiver Operating Characteristics (ROC) curve approach, we found the ideal point on the ROC curve that optimized both sensitivity and sensitivity by plotting our training data model's ability to classify the problem lists correctly when varying the discrimination thresholds. The ideal cutoff score was determined to be 3 (Figure 3). That means problem lists with a Tension score of 3 or higher will be categorized as being in Tension, and those with a score below three will be categorized as having no Tension. The area under the curve (AUC) was calculated to determine how well the model was able to classify a problem list as having Tension or not. AUC was calculated to be .84673.
Figure 3.

Receiver Operating Characteristics
Prevalence
Using the scoring matrix, and the cutoff score to categorize a problem list as being in Tension, Table 4 depicts the prevalence of Tension within the entire IH organization during a 2-year window from 2015 to 2017 and compares this with Intermountain Medical Center and the clinicians in our study.
Table 4.
Prevalence of Tension at different levels of granularity.
| Location | Population | Problem lists with a Tension item | Problem lists with a score of 3 or higher (Tension) |
| Intermountain Healthcare | 348,071 | 11,790 (3.4%) | 834 (0.25%) |
| Intermountain Medical Center | 71,858 | 2,878 (4%) | 191 (0.27%) |
| Intervention clinicians only | 49,813 | 1,491 (3%) | 104 (0.21%) |
Test the Model
Each of the final 68 lists (1/3) in the test set were scored using the matrix. The results of the test data were compared to the reference standard to calculate its classification ability. For the test set: true positive count is 25, the true negative count is 34, the false-positive count is 4, and the false-negative count is 5. Sensitivity is 83.3% (95% CI: 65.28% –94.36%), and the specificity is 89.47% (95% CI: 75.2% – 97.06%). Positive predictive value is 86.2% (95% CI: 70.93% – 94.12%); the negative predictive value is 87.17% (95% CI: 75.2% – 93.85%). Accuracy is 86.76% (95% CI: 76.36% – 93.77%).
Finally, the results were graphed to visualize how well the new model classified a problem list in Tension, as seen in Figure 4. The Tension series shows that most of the problem lists in the test set that received scores over the cutoff of 3 did have Tension, while the lists below 3 did not have Tension. This is visually evident by the fact that the Tension scored lists are well-separated from the non-Tension lists, with their peaks far from one another.
Figure 4.

Classification Ability of the Model
DISCUSSION
The majority of the lists marked with Tension concentrated on status changes that removed items from their list, such as a status change to Error, and less so with those that would make a problem harder to find within the list, such as changing a problem to Inactive and Resolved. This knowledge may aid in the development of improved problem list systems and workflows.
Our new model misclassified 9 of the 68 lists from the test set. Some lists did not have Tension, yet received scores of 4 and 16 (Figure 4). The only change on the list with a score of 16 had a diagnosis of “Bicuspid aortic valve” that was active in T1 but was marked as being in error in T2. According to the scoring matrix, this deserved a score of 16 and should have been in Tension; however, only one of the three clinicians scored this list as being in Tension. Likewise, on the problem list with a score of 4, a single chronic diagnosis of “Colitis” moved from active in T1 to resolved in T2. This earns a score of 4 from the scoring matrix, but again, only one clinician marked this as being in Tension. Understanding the reasons for these scores should improve the model’s overall accuracy.
Conversely, one problem list had a diagnosis of “Pregnancy” as active in T1, and inactive in T2. This pattern was not one of the final scoring patterns because it is a normal evolution of an acute problem. However, it was marked as being in Tension by two different clinicians. Leading one to believe that looking at the problem type, chronicity, and status metadata alone may not be sufficient; it may be necessary to also look at the specific diagnoses themselves. However, from previous work (15), it was clear that this is no small task. The top 80% of problems that clinicians used (not the entire ICD/SNOMED) were pre-coordinated, and there were still over 8,000 diagnoses. Doing similar work for Tension would not only take considerable time but would also diminish the generalizability of the model.
Several problem lists that had no changes to existing problems were marked as being in Tension. These were thought to be mistakes by the clinicians, but upon further discussion, they remarked that these were in Tension because the sheer number of new problems that were added at the T2 snapshot warranted follow-up with the patient or prior caregiver. Additionally, one clinician remarked that if many problems were added at one time, it could be because many problems were removed prior, and they were merely adding them back. The addition of problems was not considered an event that may lead to Tension, primarily because that is the regular use of a problem list. The addition of many problems simultaneously as a cause for Tension may need to be accounted for to improve the accuracy of the model.
While there is evidence from the inter-rater reliability that other clinicians may have scored the lists similarly, there is no guarantee that the lists chosen were the best representatives to use for scoring. The 201 problem lists were chosen randomly from each stratification group. It might be more fruitful to purposefully choose the lists to test for specific boundary conditions, to assess granularity and thresholds of detection criteria.
Because this study was performed retrospectively at a single organization, policy, workflow, and system differences were not controlled variables. At the time the problem list data were collected, IH did not have an organization-wide policy about how problem lists fit into the clinical workflow, leaving decisions to each clinician. Due to a lack of policy, some results may not be generalizable to other organizations with different processes and systems in place. Expanding this study to other organizations would likely provide a better measure of Tension and regress towards the true mean.
A lack of standard and easily reportable metrics hinders progress and makes it difficult to describe changes to the problem list in a common vernacular. We have shown that our proposed Tension model can identify longitudinal inconsistencies that decrease the reliability and use of the problem list. Calculating the Tension score before and after a change in policy, incentive, workflow, or system changes will allow for a post-intervention score, allowing quantification of the intervention to serve as a quality assessment measure to improve problem list usage.
Acknowledgments
We are grateful to Naveen Maram MD for his efforts in recruiting clinicians for this study and to Casey Rommel for his assistance with statistics.
Figures & Table
References
- 1.Weed LL. Medical records that guide and teach. NEJM. 1968;278 doi: 10.1056/NEJM196803142781105. [DOI] [PubMed] [Google Scholar]
- 2.Weed LL, Weed L. Medicine in denial. CreateSpace Independent Publishing Platform. 2011. p. 280.
- 3.Bice M, Bronnert J, Goodell S, Herrin B, Scichilone R, Scott R. Problem lists in health records: ownership, standardization, and accountability. J AHIMA. 2012.
- 4.Hummel J. Standardizing the problem list in the ambulatory electronic health record to improve patient care. 2012.
- 5.Acker B, Bronnert J, Brown T, Clark JS, Dunagan B, Elmer T, et al. Problem list guidance in the EHR. J AHIMA. 2011;82(9):52–8. [PubMed] [Google Scholar]
- 6.Wright A, Feblowitz J, Maloney FL, Henkin S, Bates DW. Use of an electronic problem list by primary care providers and specialists. J Gen Intern Med. 2012;27(8):968–73. doi: 10.1007/s11606-012-2033-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kaplan DM. Clear writing, clear thinking and the disappearing art of the problem list. J Hosp Med. 2007;2(4):199–202. doi: 10.1002/jhm.242. [DOI] [PubMed] [Google Scholar]
- 8.Abrams JS, Davis JH. Advantages of the problem-oriented medical record in the care of the severely injured patient. J Trauma. 1974;14(5):361–9. doi: 10.1097/00005373-197405000-00002. [DOI] [PubMed] [Google Scholar]
- 9.Bormel J. Problem lists are the keys to meaningful use. Put the big picture on your problem list. Health Manag Technol. 2011;32(2):40–1. [PubMed] [Google Scholar]
- 10.Campbell JR. Strategies for problem list implementation in a complex clinical enterprise. Proc AMIA Symp. 1998. pp. 285–9. [PMC free article] [PubMed]
- 11.Wright A, McCoy AB, Hickman TT, Hilaire DS, Borbolla D, Bowes WA 3rd, et al. Problem list completeness in electronic health records: a multi-site study and assessment of success factors. J Med Inform. 2015;84(10):784–90. doi: 10.1016/j.ijmedinf.2015.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Szeto HC CR, Gholami P, Hoffman BB, Goldstein MK. Accuracy of computerized outpatient diagnoses in a Veterans Affairs general medicine clinic. Am J Managed Care. 2002;8:37–43. [PubMed] [Google Scholar]
- 13.Lauteslager M, Brouwer HJ, Mohrs J, Bindels PJ, Grundmeijer HG. The patient as a source to improve the medical record. Fam Pract. 2002;19(2):167–71. doi: 10.1093/fampra/19.2.167. [DOI] [PubMed] [Google Scholar]
- 14.Hartung DM, Hunt J, Siemienczuk J, Miller H, Touchette DR. Clinical implications of an accurate problem list on heart failure treatment. J Gen Intern Med. 2005;20(2):143–7. doi: 10.1111/j.1525-1497.2005.40206.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hodge CM, Kuttler KG, Bowes WA 3rd. Narus SP. Problem management module: an innovative system to improve problem list workflow. AMIA Symp. 2014;2014:661–70. [PMC free article] [PubMed] [Google Scholar]
- 16.Hodge CM, Narus SP. Electronic problem lists: a thematic analysis of a systematic literature review to identify aspects critical to success. JAMIA. 2018;25(5):603–13. doi: 10.1093/jamia/ocy011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wright A, Maloney FL, Feblowitz JC. Clinician attitudes toward and use of electronic problem lists: a thematic analysis. BMC Med Inform Decis Mak. 2011;11(36) doi: 10.1186/1472-6947-11-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wright A, Pang J, Feblowitz JC, Maloney FL, Wilcox AR, McLoughlin KS, et al. Improving completeness of electronic problem lists through clinical decision support: a randomized, controlled trial. JAMIA. 2012;19(4):555–617. doi: 10.1136/amiajnl-2011-000521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chron Dis. 1985;38(1):27–36. doi: 10.1016/0021-9681(85)90005-0. [DOI] [PubMed] [Google Scholar]
- 20.van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ journal de l'Association medicale Canadienne. 2010;182(6):551. doi: 10.1503/cmaj.091117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.van Walraven C, Wong J, Forster AJ. Derivation and validation of a diagnostic score based on case-mix groups to predict 30-day death or urgent readmission. Open Med. 2012;6(3) [PMC free article] [PubMed] [Google Scholar]
- 22.Kelly PA, O'Malley KJ, Kallen MA, Ford ME. Integrating validity theory with use of measurement instruments in clinical settings. Health Serv Res. 2005;40(5 Pt 2):1605–19. doi: 10.1111/j.1475-6773.2005.00445.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research Electronic Data Capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bristol DR. Sample sizes for constructing confidence intervals and testing hypotheses. Stat Med. 1989;8(7):803–11. doi: 10.1002/sim.4780080705. [DOI] [PubMed] [Google Scholar]
- 25.Ye T, Yi Y. Sample size calculations in clinical research, third edition, by Shein-Chung Chow, Jun Shao, Hansheng Wang, and Yuliya Lokhnygina. Statistical Theory and Related Fields. 2017;1(2):265–6. [Google Scholar]

