Abstract
Patients have diverse health information needs, and secure messaging through patient portals is an emerging means by which such needs are expressed and met. As patient portal adoption increases, growing volumes of secure messages may burden healthcare providers. Automated classification could expedite portal message triage and answering. We created four automated classifiers based on word content and natural language processing techniques to identify health information needs in 1000 patient-generated portal messages. Logistic regression and random forest classifiers detected single information needs well, with area under the curves of 0.804–0.914. A logistic regression classifier accurately found the set of needs within a message, with a Jaccard index of 0.859 (95% Confidence Interval: (0.847, 0.871)). Automated classification of consumer health information needs expressed in patient portal messages is feasible and may allow direct linking to relevant resources or creation of institutional resources for commonly expressed needs.
Introduction
Patients have health information needs about a variety of topics including symptom management, medication side effects, prognosis, coping, where and from whom to get treatment, and financial assistance1–10. In 2000, Jones described the multitude of ways patients would attempt to answer their questions with informatics tools during the next decade11, such as consumer sites on the world wide web and electronic mail messages between patients and physicians as well as among patients with similar conditions. These predictions have largely proven true 1, 12–16. Recently, newer tools, such as patient portals, have emerged as another means of addressing consumer health information needs. Patient portals are web-based applications that enable patients to interact with their health information, healthcare systems, and providers17–19. Secure patient-provider messaging is one of the most popular functions of patient portals20–23. Several studies exploring the types of communication that occur through portal messaging have demonstrated the expression of important health and information needs involving prescription refills, interpretation of laboratory values, and requests for appointments20, 21, 24, 25. Other research has shown that clinical care can be delivered through portal messages24, 25. Patients may report new health problems, and these messages may result in further evaluation or treatment26.
Previous studies of the content of portal messages are limited in several ways. First, prior studies have analyzed only small numbers of messages almost exclusively in the primary care setting. Patient portals are now being widely deployed across specialties in many healthcare institutions27, 28. Second, most prior work has only described the communications with a narrow range of categories such as tests, appointments, symptoms, referrals, and general medical questions. Finally, previous research has employed manual analysis to characterize these messages. With millions of secure messages exchanged between patients and providers each year, better techniques are needed to understand fully the information needs expressed and health care delivered through patient portals.
Automated detection of concepts in consumer generated documents has been studied using natural language processing (NLP) techniques with standardized terminologies. Vocabularies have been developed to classify clinical and consumer generated documents including the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) and the Consumer Health Vocabulary (CHV) vocabularies in the Unified Medical Language System (UMLS)29, 30. These vocabularies have been used in studies analyzing clinical text as well as patient-generated text on social media31–34. Previous work has shown that adding semantic types (STYs) to clinical questions leads to improved classification35. NLP techniques using standardized vocabularies and semantic types have not been studied in automated classification of patient portal messaging.
This prior research suggests that identification of consumer health information needs in portal messages should be possible, but to date, these techniques have not been applied to analyze portal communications. In this paper, we describe the development and evaluation of an automated classifier that employs NLP techniques and machine learning algorithms on patient-generated secure messaging text to identify types of consumer health information needs within a patient portal.
Methods
Setting
This study was conducted at Vanderbilt University Medical Center (VUMC), a private, non-profit institution that provides primary and regional referral care to over 500,000 patients annually. VUMC is located in middle Tennessee and serves both adults and children with over 900 inpatient beds, 50,000 inpatient admissions, and over 1 million outpatient visits per year. VUMC launched a patient portal called My Health at Vanderbilt (MHAV) in 2005 and completed deployment of the portal throughout the clinical enterprise by 2008. MHAV patient portal is available to all patients who receive medical care at VUMC. MHAV provides a suite of common patient portal functions including access to selected portions of the electronic medical record, appointment scheduling, secure messaging, bill management, and delivery of personalized health information36, 37. MHAV currently has over 293,000 registered users, including more than 19,000 pediatric accounts, with over 255,000 logins per month.
Portal Messages
We collected the content of all patient-generated secure messages sent through the MHAV portal from the launch of the portal in 2005 through 2014. De-identified messages were extracted from the VUMC Synthetic Derivative (SD), a database containing a de-identified copy of all hospital medical records created for research purposes. Over 2.5 million patient-generated messages were present in the SD. One thousand individual messages were randomly selected from an equal distribution over the 10-year period of this data set for analysis in this study.
Taxonomy
Our research team developed a taxonomy of consumer health information needs and communications shown in Figure 1. This taxonomy provides a comprehensive model of the semantic types of consumer health communications. Previous literature describes classification of clinical and other healthcare provider needs38–41. Classifying consumer health information needs has been an ongoing research question1, 2, 4–10, 42–44, and some studies have examined patient and caregiver needs in selected diseases45, 46. However, existing taxonomies have been incomplete or difficult to use.
Figure 1.
The taxonomy of consumer health information needs.
Our taxonomy divides information needs and communications into five main categories: clinical information, medical, logistical, social, and other. We use the taxonomy to describe both needs and communications because it can be employed to categorize both questions and the answers to these questions. This taxonomy has evolved from a model of clinical information needs that express questions that require medical knowledge, such as those which could be answered by a consumer health information resource40. This component of the model is the most well developed as it has been employed to structure medical textbooks and evaluated on a diverse set of communications including patient journals and questions from patient and caregiver interviews40, 47. The model was then expanded to add the medical, logistical, and social needs that are expressed in other types of communications, such as portal messages. Medical needs are requests for delivery of medical care, such as the expression of a new symptom requiring management or an inquiry about a test result. Logistical needs are requests for pragmatic information, such as the location of a clinic or the copy of a medical record. The social category includes personal communications such as an expression of gratitude or a complaint. The other category covers communications that are incomplete or unintelligible. Portal messages can contain more than one type of communication. Components of the taxonomy have been validated with inter-rater reliability studies of classification of consumer questions.47
Gold Standard
A gold standard was developed by manual analysis of the types of needs and communications that were present in the selected 1000 messages. Two to three individuals reviewed the content of all 1000 messages and assigned all relevant categories to each message. Discrepancies were discussed and consensus achieved to produce this gold standard.
Our 1000 patient generated messages contained 721 medical needs, 234 social needs, 121 clinical information needs, and 222 logistical needs (Table 1). Thirty-one of the patient-generated messages were categorized as other, which consisted of error messages, incomplete messages, or messages that were incomprehensible. The number of different major categories in each of the remaining messages included 676 with one major category, 260 with two major categories, 30 messages with three major categories, and three messages with all four major categories. We used a co-occurrence matrix (Table 1) to represent how often information needs from multiple categories are expressed in a single message (the value in each cell represents the number of messages that have category x and y over those messages that have category x).
Table 1:
Distribution of categories of messages*
Distribution of categories of messages N (% of total messages) | ||
---|---|---|
| ||
Need Category | Category Present | Category Absent |
Clinical Information | 121 (12.1%) | 879 (87.9%) |
Medical | 721 (72.1%) | 279 (27.9%) |
Logistical | 222 (22.2%) | 778 (77.8%) |
Social | 234 (23.4%) | 766 (76.6%) |
The full distribution of messages for each subcategory of the taxonomy can be obtained from the authors.
Automated Classifiers
Automated classifiers utilized message contents to learn the major categories of consumer health information needs present using the taxonomy described above. We built four classifiers to identify consumer health information need categories in portal messages: basic, Naïve Bayes, logistic regression, and random forest. The basic classifier used regular expressions to detect whether a few words for each consumer health need category appeared in the messages (Table 2). The three other classifiers used the following machine learning techniques to predict if health information need categories were present in the message: Naïve Bayes, logistic regression, and random forests. To create the classifiers, we used python’s Scikit Learn package48. We used Bernoulli Naïve Bayes with an alpha of 0.1 and random forests with 500 trees. The inputs to the machine learning classifiers consisted of Bag of Words (BoW), concept unique identifiers (CUIs), and semantic types (STYs). BoW is a representation of messages as a bag (or set) of its words represented as a vector for each message representing the number of times a word appears in the message. CUIs are unique concepts or meanings of words and STYs are broad categories of concepts represented in the Unified Medical Language System (UMLS)49. All features were represented as matrices with the messages representing the rows, the different features representing the columns. For the BoW, the numbers of occurrences of each word in each message make up the cells in a row. CUIs and STYs were binary features, which were zero or one depending on whether the CUI or STY was present in the message. Common stop words were removed from messages for the BoW representation. To determine CUIs and STYs, we used Knowledge Map Concept Indexer (KMCI)50, a validated tool designed at Vanderbilt, to pull concepts from the text within these messages using NLP and UMLS. The machine learning classifiers were trained and tested on a gold standard corpus of 1000 documents with 5-fold cross validation.
Table 2:
Co-occurrence matrix with the percentage of messages that occur between two categories by row (e.g., 79% of messages with clinical information need have a medical need).
Co-occurrence of information needs categories | ||||
---|---|---|---|---|
| ||||
Need Category | Clinical Information | Medical | Logistical | Social |
Clinical Information | 100% | 79% | 12% | 7% |
Medical | 13% | 100% | 19% | 10% |
Logistical | 7% | 63% | 100% | 15% |
Social | 4% | 31% | 7% | 100% |
Evaluation and Statistical Analysis
We wanted to determine the ability for classifiers to predict a single category of need in a message and to predict all of the major categories of needs within a message. We evaluated the ability to predict a single major category in the machine learning classifiers with area under the receiver operator curves (AUCs). AUCs can measure the predictive ability of each classifier to learn whether a single category is present in a given patient generated message. For our basic classifier, we determined the ability to identify an information need category through the presence of several typical words found in those categories within the message.
We used the Jaccard index51 to predict how well the classifiers are able to learn the set of information needs that are in a single message, since multiple categories of information needs may be present in a single message. The Jaccard index is a measure of the similarity between two sets:
The Jaccard Index was chosen for its ability to determine similarities between two sets of binary outcomes. It has similar performance in text classification tasks as other similarity metrics such as Pearson’s correlation coefficient52. A Jaccard Index of 1 indicates that the sets A and B have all elements in both sets, and a Jaccard Index of 0 means the sets A and B have no common elements. In our study, the gold standard annotated set represents A and the predicted set from the different classifiers represents B. We averaged the Jaccard indices for each message to give an overall estimation of the ability to predict the set of information needs across the entire corpus of messages. This study was approved as non-human subjects research by the VUMC Institutional Review Board.
Results
The AUCs for the classifiers had different values depending on what type of input was used (Tables 4–7). The basic classifier’s AUCs ranged from 0.674 to 0.848. AUCs for Naïve Bayes ranged from 0.557 to 0.796, with medical needs having the highest AUC when using BoW and STYs (AUC: 0.796; 95% Confidence Interval (CI): (0.779, 0.813)). Logistic regression’s AUCs ranged from 0.814 to 0.883, with logistical needs using BoW, CUIs, and STYs having the highest AUC (AUC: 0.883; 95% CI: (0.863, 0.903)). Random forest’s AUCs ranged from 0.804 to 0.914 with logistical communication using BoW, CUIs and STYs having the highest AUC (AUC: 0.914; 95% CI: (0.883, 0.945)). The highest AUC for clinical information needs was the basic classifier (AUC: 0.848; 95% CI: (0.843, 0.853)); logistical needs was random forest with BoW and STYs (AUC: 0.914; 95% CI: (0.883, 0.945)); social needs was random forest with BoW, CUIs, and STYs (AUC: 0.839; 95% CI: 0.822, 0.855)); and medical needs was logistic regression with BoW and CUIs (AUC: 0.870; 95% CI: (0.842, 0.897)).
Table 4.
The area under the curves (AUCs) of the machine learning classifiers for Clinical Information Needs with each type of input: Bag of Words (BoW), unique concept identifiers (CUIs), and semantic types (STYs). The highest AUC is bolded for each classifier.
Feature Sets | # of Features | Basic Classifier | ||
---|---|---|---|---|
| ||||
Words | 4 | 0.848 (0.843,0.853) | ||
Naïve Bayes | Logistic Regression | Random Forest | ||
BoW | 3,194 | 0.743 (0.716,0.771) | 0.796 (0.758,0.834) | 0.795 (0.766,0.823) |
CUI | 2,059 | 0.698 (0.638,0.757) | 0.777 (0.766,0.788) | 0.786 (0.765,0.808) |
STY | 141 | 0.602 (0.569,0.635) | 0.752 (0.718,0.786) | 0.756 (0.722,0.791) |
BoW, CUI | 5,253 | 0.754 (0.715,0.794) | 0.805 (0.761,0.850) | 0.802 (0.781,0.824) |
BoW, STY | 3,335 | 0.743 (0.719,0.767) | 0.829 (0.789,0.870) | 0.802 (0.763,0.841) |
CUI, STY | 2,200 | 0.696 (0.635,0.757) | 0.803 (0.762,0.843) | 0.792 (0.759,0.825) |
BoW, CUI, STY | 5,394 | 0.751 (0.711,0.790) | 0.828 (0.782,0.873) | 0.804 (0.775,0.832) |
Table 5.
The area under the curves (AUCs) of the machine learning classifiers for Logistical Needs with each type of input: Bag of Words (BoW), unique concept identifiers (CUIs), and semantic types (STYs). The highest AUC is bolded for each classifier.
Feature Sets | # of Features | Basic Classifier | ||
---|---|---|---|---|
| ||||
Words | 4 | 0.819 (0.812,0.826) | ||
Naïve Bayes | Logistic Regression | Random Forest | ||
BoW | 3,194 | 0.755 (0.734,0.775) | 0.878 (0.841,0.915) | 0.900 (0.865,0.936) |
CUI | 2,059 | 0.763 (0.737,0.789) | 0.835 (0.787,0.883) | 0.848 (0.812,0.885) |
STY | 141 | 0.709 (0.695,0.723) | 0.784 (0.738,0.830) | 0.784 (0.738,0.830) |
BoW, CUI | 5,253 | 0.776 (0.755,0.796) | 0.876 (0.842,0.910) | 0.909 (0.879,0.938) |
BoW, STY | 3,335 | 0.766 (0.743,0.789) | 0.882 (0.858,0.906) | 0.914 (0.883,0.945) |
CUI, STY | 2,200 | 0.774 (0.741,0.806) | 0.832 (0.811,0.854) | 0.860 (0.815,0.905) |
BoW, CUI, STY | 5,394 | 0.782 (0.758,0.805) | 0.883 (0.863,0.903) | 0.909 (0.878,0.941) |
Table 6.
The area under the curves (AUCs) of the machine learning classifiers for Social Needs with each type of input: Bag of Words (BoW), unique concept identifiers (CUIs), and semantic types (STYs). The highest AUC is bolded for each classifier.
Feature Sets | # of Features | Basic Classifier | ||
---|---|---|---|---|
| ||||
Words | 6 | 0.759 (0.736,0.782) | ||
Naïve Bayes | Logistic Regression | Random Forest | ||
BoW | 3,194 | 0.673 (0.653,0.692) | 0.791 (0.756,0.826) | 0.810 (0.782,0.837) |
CUI | 2,059 | 0.557 (0.533,0.580) | 0.703 (0.666,0.740) | 0.725 (0.698,0.753) |
STY | 141 | 0.625 (0.608,0.643) | 0.726 (0.694,0.758) | 0.738 (0.705,0.770) |
BoW, CUI | 5,253 | 0.658 (0.629,0.687) | 0.795 (0.762,0.828) | 0.821 (0.789,0.854) |
BoW, STY | 3,335 | 0.693 (0.676,0.710) | 0.813 (0.786,0.840) | 0.836 (0.827,0.845) |
CUI, STY | 2,200 | 0.593 (0.569,0.616) | 0.741 (0.712,0.770) | 0.752 (0.724,0.780) |
BoW, CUI, STY | 5,394 | 0.674 (0.652,0.695) | 0.814 (0.786,0.842) | 0.839 (0.822,0.855) |
Table 7.
The area under the curves (AUCs) of the machine learning classifiers for Medical Needs with each type of input: Bag of Words (BoW), unique concept identifiers (CUIs), and semantic types (STYs). The highest AUC is bolded for each classifier.
Feature Sets | # of Features | Basic Classifier | ||
---|---|---|---|---|
| ||||
Words | 11 | 0.674 (0.663,0.684) | ||
Naïve Bayes | Logistic Regression | Random Forest | ||
BoW | 3,194 | 0.780 (0.765,0.796) | 0.861 (0.829,0.894) | 0.842 (0.808,0.875) |
CUI | 2,059 | 0.669 (0.635,0.704) | 0.817 (0.784,0.850) | 0.801 (0.765,0.838) |
STY | 141 | 0.658 (0.639,0.677) | 0.800 (0.776,0.825) | 0.788 (0.749,0.827) |
BoW, CUI | 5,253 | 0.776 (0.751,0.800) | 0.870 (0.842,0.897) | 0.843 (0.810,0.875) |
BoW, STY | 3,335 | 0.796 (0.779,0.813) | 0.865 (0.837,0.893) | 0.848 (0.816,0.880) |
CUI, STY | 2,200 | 0.699 (0.666,0.732) | 0.828 (0.795,0.862) | 0.824 (0.786,0.862) |
BoW, CUI, STY | 5,394 | 0.781 (0.757,0.805) | 0.869 (0.842,0.895) | 0.849 (0.819,0.878) |
The Jaccard index for the basic classifier averaged over all 1000 documents was 0.674 (95% CI: (0.663, 0.684)). The average Jaccard indices for the different machine learning classifiers also were different depending on the inputs (Table 7). Logistic regression’s highest Jaccard index was 0.859 (95% CI: (0.847, 0.871)) when using BoW and CUIs with or without STYs, while random forest’s highest Jaccard index was for BoW and STYs: 0.858 (95% CI: (0.847, 0.870)). Naïve Bayes’ highest Jaccard index was 0.776 (95% CI: (0.763, 0.790) with BoW and STYs.
Discussion
We examined the ability to classify automatically the content of patient generated messages from patient portals into consumer health information need categories. Our developed classifiers showed promise in identification of the types of consumer health information needs expressed in portal messages, but different types of needs were best identified by different approaches to automated classification. The best classifiers for each major category of information needs had high predictive ability and were able to determine which major categories are present in a single message. As adoption of patient portals increases, automated techniques may be needed to assist in managing growing volumes of secure messages. Automated classification of health information needs may aid in connecting patients to needed resources and in triaging portal messages. In addition, automated classifiers could support consumer health informatics research to understand the nature of communications and care delivered within patient portals. Such work could lead to better resources for commonly expressed information needs and might support compensation for care delivered online.
In this study, our gold standard had a majority of medical needs among messages. This finding supports previous literature20, 21 demonstrating the delivery of care through patient portals. However, we did notice between 10–20% of messages discussed other content such as clinical information needs about conditions or interventions, logistical issues having to do with billing and navigating the health care system, and social needs such as acknowledgements of care. Therefore, a patient portal can be used for many different health information needs beyond those previously reported20, 21, 24, 25. Our classifiers had different levels of performance for each category of information needs. Certain categories, such as clinical information needs could be identified based on a few words with a basic classifier. The observed clinical information needs expressed in portal messages were typically questions about test results and procedures that have a characteristic phrasing, which could be easily identified by a few common terms. The other categories needed more sophisticated machine learning techniques as their expressions were more diverse. Multiple words appearing together in certain needs and not others would require more complex methods to identify needs. Finally, meanings of words (concepts) or categories that words fit under (semantic types) may be important in determining if a need is present. More complex methods like Naïve Bayes, logistic regression and random forests with NLP performed better in larger and more complex parts of the taxonomy.
Logistic regression and random forests performed similarly, and both outperformed Naïve Bayes. Logistic regression performed better in clinical information and medical needs, but poorer than random forests for the other categories. Each classifier performed better with different inputs. Social needs performed best when using BoW and STYs for Naïve Bayes, but BoW, CUIs, and STYs for logistic regression and random forest. BoW and STYs worked the best when determining logistical needs for random forests, but BoW, CUIs, and STYs were best for Naïve Bayes and logistic regression in the messages. However, the gains in AUCs when adding NLP CUIs and STYs over just BoW were modest in most cases. These findings likely represent the fact that NLP tools may not add very much, and it is the combination of words that predict information needs categories. Several factors may influence this observation. Patient generated text is likely to have less formal biomedical content and thus may have fewer concepts than medical texts. Higher order NLP methods, such as negation, also likely have little impact on content type. Thus, current results do not necessarily support a significant need for NLP, but more research into NLP of patient generated messages is needed. Logistic regression identified all of the health information needs categories in a single message better than other methods based on the Jaccard index, but random forest had a similar Jaccard index. As each classifier performed better for different information needs categories, a hybrid of information needs classifiers might best determine which categories are in a single message.
The performance of these classifiers may be limited by several factors. First, patient generated messages are more likely to include misspellings which may adversely affect information need identification. These messages may also contain abbreviations not commonly used and different abbreviations for the same word. Second, automatic derivation of meaning from patient-generated texts using computers is an ongoing challenge. Our classifiers may not be able to understand the meaning of the text, and therefore cannot determine the category of information need. Third, we utilized a standardized vocabulary for our classifiers; however standardized vocabularies may not capture different ways of expressing concepts.
Automatic classification of health information needs in patient portals has several potentially important applications. First, it could allow triaging of patient generated messages to different members of the health care team or information resources. For example, logistical needs are more likely to be answerable by an administrative assistant, whereas medical needs may require a nurse or physician to respond. Clinical information needs can be answered by an information resource such as an educational module or trusted web application. Therefore, automatic classification might enable routing of these messages appropriately without human intervention. Second, classifiers might be used to detect levels of urgency in messages. North et al. showed that occasionally patients will send potentially life-threatening symptoms through patient portals21. Utilizing automated classifiers to detect urgent messages could prevent adverse events by prioritizing responses or alerting a provider through an alternative means of communication. Finally, these classifiers could be used to determine health information needs that are frequently expressed in select patient populations, drive appropriate resource development, and potentially automatically respond to messages with links to appropriate resources.
This study has several limitations. First, this study was conducted at a single institution with a locally developed patient portal. Although the information needs seen in these messages are common needs that have been seen in other papers about patient portal messaging20, 21, our results may be limited by the unique policies and procedures developed for MHAV. Second, this study employed a small data set. Therefore, all information needs and the full breadth of their expression may not be adequately represented. Third, this study has older data including the years 2005–2014, and some of the content of messages may have become antiquated based on secular trends. Our ongoing research projects will evaluate these methodologies on larger data sets, and explore the performance of automated classifiers across clinical specialties were vocabularies and needs may differ.
Conclusions
We have created automated classifiers that show promising results in identifying the types of consumer health information needs expressed in secure messages sent through a patient portal. Certain classifiers were able to determine different semantic categories of information needs better than others. Basic classifiers were better at identifying logistical needs while logistic regression and random forests were better for clinical information, social, and medical needs. NLP techniques can improve the ability to identify the types of information needs in patient-generated messages, but the improvements were modest. Additional research is needed to improve the performance of these classifiers to support applications such as message triage, question answering, resource development for common questions, and research on consumer health communications.
Table 3.
Basic classifier using words to determine if a message belongs to one of the major categories of health information needs.
Need Category | Words |
---|---|
| |
Clinical Information | question, normal, medication, procedure |
Logistical | insurance, record, bill, cover |
Social | thank you very much, thank you so much, thanks very much, thanks so much, appreciate, your time |
Medical | refill, prescription, appointment, pain, hurt, lab, follow up, test, xray, ct, mri |
Table 8.
The average Jaccard indices with a 95% confidence interval of the machine learning classifiers for each type of input: Bag of Words (BoW), unique concept identifiers (CUIs), and semantic types (STYs). The highest Jaccard index is bolded for each classifier.
Feature Sets | # of Features | Basic Classifier | ||
---|---|---|---|---|
| ||||
Words | 24 | 0.674 (0.663, 0.684) | ||
Naïve Bayes | Logistic Regression | Random Forest | ||
BoW | 3,194 | 0.769 (0.756,0.783) | 0.854 (0.841,0.866) | 0.848 (0.836,0.860) |
CUI | 2,059 | 0.741 (0.728,0.754) | 0.819 (0.808,0.831) | 0.824 (0.813,0.835) |
STY | 141 | 0.774 (0.761,0.788) | 0.804 (0.793,0.816) | 0.810 (0.797,0.824) |
BoW, CUI | 5,253 | 0.761 (0.747,0.775) | 0.859 (0.847,0.871) | 0.855 (0.844,0.867) |
BoW, STY | 3,335 | 0.776 (0.763,0.790) | 0.858 (0.846,0.870) | 0.858 (0.847,0.870) |
CUI, STY | 2,200 | 0.757 (0.744,0.770) | 0.820 (0.807,0.833) | 0.826 (0.814,0.839) |
BoW, CUI, STY | 5,394 | 0.769 (0.755,0.783) | 0.859 (0.847,0.871) | 0.856 (0.845,0.868) |
Acknowledgments
We are grateful to Shilo Anders, Ebone Ingram, and Jared Shenson for their assistance in creation of the manually annotated set of portal messages used as the gold standard for this research project. The portal message content used for the analyses was obtained from VUMC’s Synthetic Derivative, which is supported by institutional funding and by the Vanderbilt CTSA grant ULTR000445 from NCATS/NIH. Robert Cronin and Sharon Davis were supported by the 5T15LM007450–12 training grant from the National Library of Medicine.
References
- 1.Abdulla S, Vielhaber S, Machts J, Heinze H-J, Dengler R, Petri S. Information needs and information-seeking preferences of ALS patients and their carers. Amyotroph Lateral Scler Frontotemporal Degener. 2014:1–8. doi: 10.3109/21678421.2014.932385. [DOI] [PubMed] [Google Scholar]
- 2.Archibald MM, Scott SD. The information needs of North American parents of children with asthma: a state-of-the-science review of the literature. J Pediatr Health Care. 2014;28(1):5–13.e2. doi: 10.1016/j.pedhc.2012.07.003. [DOI] [PubMed] [Google Scholar]
- 3.Duggan C, Bates I. Medicine information needs of patients: the relationships between information needs, diagnosis and disease. Qual Saf Health Care. 2008;17(2):85–9. doi: 10.1136/qshc.2005.017590. [DOI] [PubMed] [Google Scholar]
- 4.Galarce EM, Ramanadhan S, Weeks J, Schneider EC, Gray SW, Viswanath K. Class, race, ethnicity and information needs in post-treatment cancer patients. Patient Educ Couns. 2011;85(3):432–9. doi: 10.1016/j.pec.2011.01.030. [DOI] [PubMed] [Google Scholar]
- 5.Harding R, Selman L, Beynon T, et al. Meeting the communication and information needs of chronic heart failure patients. J Pain Symptom Manage. 2008;36(2):149–56. doi: 10.1016/j.jpainsymman.2007.09.012. [DOI] [PubMed] [Google Scholar]
- 6.Molassiotis A, Brunton L, Hodgetts J, et al. Prevalence and correlates of unmet supportive care needs in patients with resected invasive cutaneous melanoma. Ann Oncol. 2014 doi: 10.1093/annonc/mdu366. [DOI] [PubMed] [Google Scholar]
- 7.Palisano RJ, Almarsi N, Chiarello LA, Orlin MN, Bagley A, Maggs J. Family needs of parents of children and youth with cerebral palsy. Child Care Health Dev. 2010;36(1):85–92. doi: 10.1111/j.1365-2214.2009.01030.x. [DOI] [PubMed] [Google Scholar]
- 8.Shea-Budgell MA, Kostaras X, Myhill KP, Hagen NA. Information needs and sources of information for patients during cancer follow-up. Curr Oncol. 2014;21(4):165–73. doi: 10.3747/co.21.1932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Umgelter K, Anetsberger A, Schmid S, Kochs E, Jungwirth B, Blobner M. Survey on the need for information during the preanesthesia visit. Anaesthesist; 2014. [DOI] [PubMed] [Google Scholar]
- 10.Zirkzee E, Ndosi M, Vlieland TV, Meesters J. Measuring educational needs among patients with systemic lupus erythematosus (SLE) using the Dutch version of the Educational Needs Assessment Tool (D-ENAT) Lupus; 2014. [DOI] [PubMed] [Google Scholar]
- 11.Jones R. Developments in consumer health informatics in the next decade. Health Libraries Review. 2000;17(1):26–31. [Google Scholar]
- 12.Cobb NK. Online consumer search strategies for smoking-cessation information. Am J Prev Med. 2010;38(3 Suppl):S429–32. doi: 10.1016/j.amepre.2009.12.001. [DOI] [PubMed] [Google Scholar]
- 13.Mercado-Martínez FJ, Urias-Vázquez JE. Hispanic American kidney patients in the age of online social networks: content analysis of postings, 2010 – 2012. Rev Panam Salud Publica. 2014;35(5–6):392–8. [PubMed] [Google Scholar]
- 14.Katz SJ, Moyer CA, Cox DT, Stern DT. Effect of a triage-based E-mail system on clinic resource use and patient and physician satisfaction in primary care: a randomized controlled trial. J Gen Intern Med. 2003;18(9):736–44. doi: 10.1046/j.1525-1497.2003.20756.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Roter DL, Larson S, Sands DZ, Ford DE, Houston T. Can e-mail messages between patients and physicians be patient-centered? Health Commun. 2008;23(1):80–6. doi: 10.1080/10410230701807295. [DOI] [PubMed] [Google Scholar]
- 16.White CB, Moyer CA, Stern DT, Katz SJ. A content analysis of e-mail communication between patients and their providers: patients get the message. J Am Med Inform Assoc. 2004;11(4):260–7. doi: 10.1197/jamia.M1445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Patient portal – Wikipedia, the free encyclopedia.
- 18.Archer N, Fevrier-Thomas U, Lokker C, McKibbon KA, Straus SE. Personal health records: a scoping review. J Am Med Inform Assoc. 2011;18(4):515–22. doi: 10.1136/amiajnl-2011-000105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. HealthIT.gov What is a patient portal? | FAQs | Providers & Professionals |. 2014. HealthIT.gov.
- 20.Haun JN, Lind JD, Shimada SL, et al. Evaluating user experiences of the secure messaging tool on the Veterans Affairs’ patient portal system. J Med Internet Res. 2014;16(3) doi: 10.2196/jmir.2976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.North F, Crane SJ, Chaudhry R, et al. Impact of patient portal secure messages and electronic visits on adult primary care office visits. Telemed J E Health. 2014;20(3):192–8. doi: 10.1089/tmj.2013.0097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ralston JD, Hirsch IB, Hoath J, Mullen M, Cheadle A, Goldberg HI. Web-based collaborative care for type 2 diabetes: a pilot randomized trial. Diabetes Care. 2009;32(2):234–9. doi: 10.2337/dc08-1220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Weingart SN, Rind D, Tofias Z, Sands DZ. Who uses the patient internet portal? The PatientSite experience. J Am Med Inform Assoc. 2006;13(1):91–5. doi: 10.1197/jamia.M1833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Barnett TE, Chumbler NR, Vogel WB, Beyth RJ, Qin H, Kobb R. The effectiveness of a care coordination home telehealth program for veterans with diabetes mellitus: a 2-year follow-up. Am J Manag Care. 2006;12(8):467–74. [PubMed] [Google Scholar]
- 25.Ross SE, Moore LA, Earnest MA, Wittevrongel L, Lin CT. Providing a web-based online medical record with electronic communication capabilities to patients with congestive heart failure: randomized trial. J Med Internet Res. 2004;6(2) doi: 10.2196/jmir.6.2.e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Stiles RA, Deppen SA, Figaro MK, et al. Behind-the-scenes of patient-centered care: content analysis of electronic messaging among primary care clinic providers and staff. Med Care. 2007;45(12):1205–9. doi: 10.1097/MLR.0b013e318148490c. [DOI] [PubMed] [Google Scholar]
- 27.Cronin RM, Davis SE, Shenson JA, Chen Q, Rosenbloom ST, Jackson GP. Growth of Secure Messaging Through a Patient Portal as a Form of Outpatient Interaction across Clinical Specialties. Applied Clinical Informatics. doi: 10.4338/ACI-2014-12-RA-0117. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Shenson J, Cronin R, Davis S, Chen Q, Jackson G. Rapid growth in surgeons’ use of secure messaging in a patient portal. Surgical Endoscopy. 2015:1–9. doi: 10.1007/s00464-015-4347-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Côté RA, Robboy S. Progress in medical information management. Systematized nomenclature of medicine (SNOMED) JAMA. 1980;243(8):756–62. doi: 10.1001/jama.1980.03300340032015. [DOI] [PubMed] [Google Scholar]
- 30.Zeng QT, Tse T, Divita G, et al. Term identification methods for consumer health vocabulary development. J Med Internet Res. 2007;9(1) doi: 10.2196/jmir.9.1.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Doing-Harris KM, Zeng-Treitler Q. Computer-assisted update of a consumer health vocabulary through mining of social network data. J Med Internet Res. 2011;13(2) doi: 10.2196/jmir.1636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jiang L, Yang CC, editors. Using Co-occurrence Analysis to Expand Consumer Health Vocabularies from Social Media Data 2013. IEEE; 2013. [Google Scholar]
- 33.Kim J, Joo J, Shin Y. An exploratory study on the health information terms for the development of the consumer health vocabulary system. Stud Health Technol Inform. 2009:146. [PubMed] [Google Scholar]
- 34.Lee D, de Keizer N, Lau F, Cornet R. Literature review of SNOMED CT use. J Am Med Inform Assoc. 2014;21(e1):e11–9. doi: 10.1136/amiajnl-2013-001636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kobayashi T, Shyu C-R. Representing Clinical Questions by Semantic Type for Better Classification; AMIA Annual Symposium Proceedings; 2006. p. 987. [PMC free article] [PubMed] [Google Scholar]
- 36.Allphin M. Patient Portals 2013: On Track for Meaningful Use? (KLAS research, 2013 2013/08/29/). Report No. [Google Scholar]
- 37.Osborn CY, Rosenbloom ST, Stenner SP, et al. MyHealthAtVanderbilt: policies and procedures governing patient portal functionality. J Am Med Inform Assoc. 2011;18(Suppl 1):i18–23. doi: 10.1136/amiajnl-2011-000184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Allen M, Currie LM, Graham M, Bakken S, Patel VL, Cimino JJ. The classification of clinicians’ information needs while using a clinical information system. AMIA Annu Symp Proc. 2003:26–30. [PMC free article] [PubMed] [Google Scholar]
- 39.Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med. 1985;103(4):596–9. doi: 10.7326/0003-4819-103-4-596. [DOI] [PubMed] [Google Scholar]
- 40.Purcell GP. Surgical textbooks: past, present, and future. Ann Surg. 2003;238(6 Suppl):S34–41. doi: 10.1097/01.sla.0000097525.33229.20. [DOI] [PubMed] [Google Scholar]
- 41.Schnall R, Cimino JJ, Currie LM, Bakken S. Information needs of case managers caring for persons living with HIV. J Am Med Inform Assoc. 2011;18(3):305–8. doi: 10.1136/jamia.2010.006668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bender JL, Hohenadel J, Wong J, et al. What patients with cancer want to know about pain: a qualitative study. J Pain Symptom Manage. 2008;35(2):177–87. doi: 10.1016/j.jpainsymman.2007.03.011. [DOI] [PubMed] [Google Scholar]
- 43.Phillips SA, Zorn MJ. Assessing consumer health information needs in a community hospital. Bull Med Libr Assoc. 1994;82(3):288–93. [PMC free article] [PubMed] [Google Scholar]
- 44.Roberts K, Kilicoglu H, Fiszman M, Demner-Fushman D. Decomposing Consumer Health Questions. ACL. 2014;2014 [PMC free article] [PubMed] [Google Scholar]
- 45.Alzougool B, Gray K, Chang S. An In-depth Look at an Informal Carer’s Information Needs: A Case Study of a Carer of a Diabetic Child. Electronic Journal of Health Informatics. 2009;4(1) [Google Scholar]
- 46.Boot CRL, Meijman FJ. Classifying health questions asked by the public using the ICPC-2 classification and a taxonomy of generic clinical questions: an empirical exploration of the feasibility. Health Commun. 2010;25(2):175–81. doi: 10.1080/10410230903544969. [DOI] [PubMed] [Google Scholar]
- 47.Shenson JAIE, Colon N, Jackson GP. Application of a consumer health information needs taxonomy to questions in maternal-fetal care; AMIA Annu Symp Proc; In press. [PMC free article] [PubMed] [Google Scholar]
- 48.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–30. [Google Scholar]
- 49.Humphreys BL, Lindberg DA, Schoolman HM, Barnett GO. The Unified Medical Language System An Informatics Research Collaboration. Journal of the American Medical Informatics Association. 1998;5(1):1–11. doi: 10.1136/jamia.1998.0050001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Denny JC, Spickard A, Miller RA, et al. Identifying UMLS concepts from ECG Impressions using KnowledgeMap. AMIA Annu Symp Proc. 2005:196–200. [PMC free article] [PubMed] [Google Scholar]
- 51.Real R, Vargas JM. The probabilistic basis of Jaccard’s index of similarity. Systematic biology. 1996:380–5. [Google Scholar]
- 52.Huang A, editor. Similarity measures for text document clustering; Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008); Christchurch, New Zealand: 2008. [Google Scholar]