Abstract
Background
Data collected by health care organizations consist of medical information and documentation of interactions with patients through different communication channels. This enables the health care organization to measure various features of its performance such as activity, efficiency, adherence to a treatment, and different quality indicators. This information can be linked to sociodemographic, clinical, and communication data with the health care providers and administrative teams. Analyzing all these measurements together may provide insights into the different types of patient behaviors or more accurately to the different types of interactions patients have with the health care organizations.
Objective
The primary aim of this study is to characterize usage profiles of the available communication channels with the health care organization. The main objective is to suggest new ways to encourage the usage of the most appropriate communication channel based on the patient’s profile. The first hypothesis is that the patient’s follow-up and clinical outcomes are influenced by the patient’s preferred communication channels with the health care organization. The second hypothesis is that the adoption of newly introduced communication channels between the patient and the health care organization is influenced by the patient’s sociodemographic or clinical profile. The third hypothesis is that the introduction of a new communication channel influences the usage of existing communication channels.
Methods
All relevant data will be extracted from the Clalit Health Services data warehouse, the largest health care management organization in Israel. Data analysis process will use data mining approach as a process of discovering new knowledge and dealing with processing data extracted with statistical methods, machine learning algorithms, and information visualization tools. More specifically, we will mainly use the k-means clustering algorithm for discretization purposes and patients’ profile building, a hierarchical clustering algorithm, and heat maps for generating a visualization of the different communication profiles. In addition, patients’ interviews will be conducted to complement the information drawn from the data analysis phase with the aim of suggesting ways to optimize existing communication flows.
Results
The project was funded in 2016. Data analysis is currently under way and the results are expected to be submitted for publication in 2019. Identification of patient profiles will allow the health care organization to improve its accessibility to patients and their engagement, which in turn will achieve a better treatment adherence, quality of care, and patient experience.
Conclusions
Defining solutions to increase patient accessibility to health care organization by matching the communication channels to the patient’s profile and to change the health care organization’s communication with the patient to a highly proactive one will increase the patient’s engagement according to his or her profile.
International Registered Report Identifier (IRRID)
RR1-10.2196/10734
Keywords: health communication, population characteristics, eHealth, mHealth, telehealth, health information systems, consumer health informatics, delivery of health care, machine learning
Introduction
Background
Health care organizations and patients communicate with each other using various communication channels [1,2]. Some of these communication channels are traditional: face-to-face meetings with a physician or a nurse, face-to-face interactions with the administrative staff, and phone calls. However, in the past decade, many health care organizations introduced novel methods of digital communication with patients such as text messages, emails, video calls, websites, and mobile apps. The communication channels between the health care organization and its patients have been examined and analyzed in previous studies [3-10].
Data mining and machine learning methodologies have been used to define or redefine clusters of patients according to their state of health and other sociodemographic data [11,12]. Recently, process mining has been used to try to improve communication between consumers and health care providers [13]. However, no studies attempting to cluster patients by combining medical, sociodemographic, or communication characteristics have been conducted and certainly not in a population as large as the one proposed in this study. We expect that such research will improve communication between patients, service providers, and medical organizations and will improve the quality of treatment and treatment effectiveness and responsiveness.
Aims and Objectives
Finding the circumstances and the extent to which different population segments use different communication channels, and specifically, the extent to which usage of newly introduced channels replaces the usage of more traditional channels will help us learn about the effectiveness of these new channels. Tying these population segments’ communication behavior with their sociodemographic profiles and health outcomes will help us establish the association between the 3, and it may help drive the hypotheses as to the causation. In addition, identifying communication-based population segments may help health care providers to use the most appropriate channels with each population segment, leading to more efficient and targeted communications, for example, identifying and quantifying the early adopters group will help the health care organization to estimate the usage level of a newly developed communication channel, its effectiveness in driving the intended message, and to some extent, its effect on health outcomes. Accordingly, this will also allow to improve the quality of treatment, treatment effectiveness, and responsiveness.
The aims of this retrospective data study are to assist health care policy makers to improve and personalize the communication between patients and health care professionals (eg, physicians and nurses). Communication improvement includes enhancing the accessibility of health care professionals by expanding the capabilities of current communication channels and introducing new ones. These communications will help to improve patient engagement with the treatment process, increase patient responsiveness to follow-up requirements and treatment, and improve patient experience with health care services. More specifically, the primary aim of this study is to characterize usage profiles in the available communication channels in the Clalit Health Services (Clalit), each one of them without considering the others and then all of them together. The second aim is to establish relationships between communication profiles, sociodemographic, and medical patients’ profiles. The main objective is to suggest new ways to encourage the usage of the most appropriate communication channel based on the patient’s profile. A secondary objective is to suggest ways for improving communication between the patient and the health care organization mainly through technological means.
Hypotheses
The first hypothesis is that the patient’s follow-up and clinical outcomes are influenced by the patient’s preferred channel(s) of communication with the health care organization. If this hypothesis is validated, the research will quantify the phenomenon.
The second hypothesis is that the adoption of newly introduced communication channels between the patient and the health care organization is influenced by the patient’s sociodemographic and/or clinical profile. If this hypothesis is validated, the research will identify sociodemographic and/or clinical attributes that affect the adoption of newly introduced communication channels.
The third hypothesis is that the introduction of a new communication channel influences the usage of existing communication channels. If this hypothesis is validated, the research will characterize the changes in usage of existing communication channels once a new communication channel is introduced.
Methods
Materials
This is a data-based study that analyzes information stored in Clalit electronic medical records (EMRs) and in logs documenting access to various communication channels between patients and Clalit, such as the internet personal health records, and telephone logs. Researchers have full access to Clalit EMRs and logs on the entire insured population of 4.53 million patients in 2015, which constitute 54% of the Israeli population of 8.38 million as of 2015. Data collected include demographic, clinical, and pharmacological information. In addition, we plan to conduct interviews with a representative sample of the patients to learn directly about the patients’ perceptions, their relationship with the various means of communication, patterns of use, and suggestions for improvement. We hope that this survey will provide supplementary information to the one we will receive from analyzing the data.
Clinical data from community and hospital settings and pharmacological data are routinely collected in the data warehouses (DWHs) of the health maintenance organization (HMO) and classified into the appropriate data world (eg, appointment scheduling, consultation with a physician, appointment with a specialist, diagnosis during hospitalization, medical services, and prescriptions). The information recorded includes sociodemographic data (gender, marital status, number of children at home, age, origin, socioeconomic status (SES), and place of residence), medical information (dates of specialist appointment, physician license number and the corresponding specialization, diagnoses, date of each diagnosis, prescriptions, acquisition of prescriptions, laboratory results, and imaging), and communication data (appointment date, date the appointment occurred, time elapsed between the scheduled appointment and the actual appointment, and the way the appointment was scheduled—through a medical secretary, call center, website, or mobile app). All relevant pieces of information include a patient identifier, which allows compiling all data relevant to a specific patient into a single record.
The information to be analyzed is extracted from the EHR DWH of Clalit and includes data collected between 2008 and 2016 for all relevant patients. The long duration of the study will allow us to identify changes in the ways patients interact with the HMO as a function of time and as a function of new communication channels the HMO introduced (eg, website, mobile apps, and the use of the short message service [SMS] text messaging). Accordingly, the patient can start or stop using 1 or more channels to interact with the HMO. The patients included in this study are aged 21 years and over and are members of Clalit for at least 1 year before 2008 and are still alive in 2016. We will focus our study on patients with chronic disease because we want to examine long-term adherence and efficacy. In addition, patients who suffer from 1 chronic disease or more have a high rate of resource consumption. In the United States, for example, 86% of health care spending is devoted to patients with chronic diseases [14]. In particular, we will examine diabetic patients, who in 2001 accounted for about 20% of the patient population [15]. We hope that the study will help optimize the processes in which these patients participate. The incidence of chronic diseases in general and of diabetes in particular is increasing over the years due to several factors, most notably the aging of the Israeli population. According to Clalit data, as of the end of 2014, more than 40% of the insured population had at least 1 diagnosis that is defined as chronic (eg, diabetes, asthma, heart disease, mental illness, and cancer). Patients with diabetes constitute more than 300,000 individuals with our inclusion criteria [16,17]. The profiles that will be found will help define the recommendations and policies that will improve communication with specific subpopulation groups and will increase the effectiveness of treatment and patient adherence. Chronic diseases are not spread uniformly by age; however, given the high cost of treating patients with chronic diseases, we believe it is more useful to concentrate on these patients despite this bias.
Ethics
Ethical approval for the study was granted by the Clalit ethical committee (147-15-COM2; January 26, 2016).
Methodologies
The communication between health care providers (ie, physicians, nurses, hospitals, and more globally, HMOs) and patients is studied by focusing, generally, only on 1 or 2 of the channels [1-12]. To fulfill our research aims and objectives, our analysis will consist of characterizing the usage profiles of existing nontechnological and technological communication channels over a period of 9 years, taking into account that Clalit has added and changed over the time the methods by which patients contact health care professionals (eg, the introduction of Web and mobile apps). Then, the sociodemographic and clinical profiles of each one of the different communication channels’ usage profiles will be defined. This will allow us to qualitatively evaluate the influence of the communication profile on patient’s engagement and follow-up quality.
As part of the analysis, we will evaluate impacts of new communication channels introduced over the research period. This will allow us to suggest future improvements to the communication between the patient and physician or nurse, with the aim of improving the work processes of the health organization.
This research is based on knowledge discovery in databases (KDD) methodologies [18,19]. KDD is an interdisciplinary discipline that deals with methodologies for the extraction and identification of valid, new, nontrivial patterns of data that have the potential to be useful and understandable [18-20]. The continued increase in the amounts of data available, a product of the unprecedented development of computer and communications technologies over the past two decades, created a unique opportunity to implement KDD methodologies. Data science experts from different disciplines are therefore challenged to find new and effective ways to extract and generate new knowledge from existing data.
In the analysis phase, we will use one-dimensional and multidimensional statistical methods as well as different data mining algorithms. The data mining stage is part of the KDD process and focuses mainly on the discovery of unknown patterns. For this purpose, we will use and tune, if necessary, data mining [21] and machine learning [22] algorithms for dealing with the multidimensional dataset (ie, sociodemographics, bio-clinical, and communication-related data over time), which will be explored in this study. The patterns found in this stage are then evaluated and interpreted to form the knowledge extracted from the KDD process.
The KDD process that will be developed and implemented in this research includes data collection and integration, early processing and cleaning of data, development and implementation of data mining algorithms to discover new knowledge and a qualitative research [18-20].
Data Acquisition
Clalit DWH is the main source of information the research uses, and a replication for research purposes is updated on a weekly basis. The data extracted from Clalit DWH for each patient comprise the following information:
-
Sociodemographic data
Date and country of birth and date of immigration when relevant
Date of death (allowing exclusion)
Start and end date of membership (allowing exclusion)
Gender
Ethnic sector (general Jewish, Arab, and ultra-orthodox Jewish)—the ethnic sector is determined according to the clinic at which the member receives primary care medicine. It is computed by the Clalit computer services unit by integrating geostatistical data from the Israeli Central Bureau of Statistics
Clinic-level SES (3 categories: low, mid, and high)—the SES is determined according to the clinic at which the member receives primary care medicine. It is computed by the Clalit computer services unit by integrating geostatistical data from the Israeli Central Bureau of Statistics
-
Bio-clinical
Body mass index (BMI) category (underweight, normal, overweight, obese, or unknown) [23]
Smoking status (current, past, never, or unknown)
Last available glycated hemoglobin (HbA1c) measurement reflecting the level of blood sugar control in patient with diabetes
Last available lipidemic profiling (high-density lipoprotein, low-density lipoprotein, triglycerides, and total cholesterol)
Adjusted clinical groups (ACG) [24]
Comorbidities according to the Clalit chronic diseases registry [15]
Proportion of days covered by treatment of diabetes when relevant based on purchase of drugs used in diabetes and more particularly by blood glucose lowering drugs excluding insulin (Anatomical Therapeutic Chemical Classification System codes starting with A10B) [25]
-
Communication or contacts with the HMO data
Appointments scheduling (through a medical secretary—data available since 2009, call center—data available since 2009, website—since 2011, or mobile app—since 2012)
Consultations with a physician or a nurse
Hospitalizations
Consultations at an emergency department
Nonqueue requests (eg, request for periodic checks, prescription renewal, and sick leave certificate) done without visiting but only by sending a request to a physician through a call to a medical secretary or a nurse or by completing a paper or an electronic form
Any purchases in a pharmacy of the HMO or purchase related to a prescription in other pharmacies having an agreement with the HMO
Prescription renewals by SMS—since 2015.
Data Preprocessing
Data Cleansing
After integrating the data collected and extracted from the Clalit’s DWH, we will prepare it for analysis. This stage includes cleansing of the data collected by Clalit’s DWH when necessary. The main objective of this phase is to reduce noise by detecting and removing or correcting outliers [26] in the dataset by evaluating the quality of the data [21]. An outlier is a data measurement that is inconsistent with other historical measurement data of the same individual (eg, outlaying height value, an exceptionally high number of consultations with a physician—a few hundred per year-). When a measurement-specific (eg, BMI) algorithm has been developed in-house by Clalit Research Institute (CRI) for epidemiological studies, outlier detection and data correction will be processed using it. For example, an algorithm screens data on BMI, weight, and height, to detect and handle outliers in the recording of 1 of these 3 measurements (eg, due to mistyping). When the CRI algorithms will not be relevant, outliers will be detected with statistical approaches such as median absolute deviation to find outliers (nonparametric due to lack of knowledge regarding the data distribution [27] and/or machine learning algorithms such as k-means [28]).
Data related to communication between patient and Clalit have not yet been fully processed and cleansed before, and accordingly, we may need to develop special cleaning and correction algorithms for these data. If data correction algorithms and/or algorithms that deal with cases of missing information do not exist for any given data in our database [29,30], we will use appropriate machine learning algorithms and/or statistical approaches [31,32] to correct and/or deal with missing data where needed. Examples of potential problems that we might encounter are identifying irrelevant entries (eg, entries related to quality assurance traffic and testing and entries that are not the result of human activity) and lack of full documentation. In addition, interface exposed to the user is a breathing interface and changes over time depending on the services that the HMO chooses to provide through the Web-based and app services. A new version of the website, for example, is released every 6 months. Data processing and analysis should reflect these changes.
Data Transformation
Many methods of machine learning and data mining require, as part of the preprocessing phase, a data reformulation such as a new categorization or a new grouping of numerical, categorical, or textual data to reduce the number of values each attribute has [28].
This step involves the use of techniques for reducing the number of dimensions or transduction methods to reduce the number of variables for analysis or to find invariant representations of the data [26,33-35].
For example, if we consider attributes with continuous values such as laboratory tests or clinical measurement having existing and defined scales in the literature, we will reformulate them into categorical values as a part of the dataset dimension reduction. For example, HbA1c values may be divided into 5 categories: excellent control (<6.5%), good control (6.5% to 7.5%), moderate control (7.5% to 8.9%), poor control (≥9%), and not available [36,16].
However, for attributes that do not have predefined scales in the literature or which are specific to Clalit, such as the number of appointments by using the HMO website or the number of visits to a physician per year, we will use the k-means clustering algorithm for discretization purposes in 6 groups of resource consumption: “No” (meaning not consuming of the related resource, so excluded from the k-means run and assigned to this group), “Small,” “Small-Moderate,” “Moderate,” “Moderate-Large,” and “Large.” The cluster bounds are validated, if necessary, by a domain expert (ie, a public health practitioner having some experience with the Clalit data).
Data Mining
For identifying population clusters, different machine learning methods and algorithms must be used. The main aim is to characterize usage profiles in the available communication channels. Considering the fact that we do not have prior knowledge on the data, we will use unsupervised machine learning algorithms [37-43] and will more particularly focus on k-means [38] and hierarchical clustering [37]. We choose to use these specific algorithms because they are relatively simple to communicate with people having less technical knowledge, such as decision and policy makers of the HMO, which will get the final analysis report and will need to implement its recommendations.
The first data mining goal is to find the number of hidden k clusters in the “Communication/contacts with the HMO data” or in other words, the number of different types of patient communication profiles. This will be performed on the available data of the year 2016 because by that time, data cleansing will be fully performed. As communication channels constantly evolve, we chose the most recent year to be the reference point to which previous years, with less communication channels, are compared with. The “Communication/contacts with the HMO data” of 2016 will be clustered as follows:
For each k between 2 and 100, 100 randomly selected samples of 20% of the cohort will be generated
For each sample, k-means will be run
For each run, the Ray-Turi criterion [44] will be computed
The results of the overall Ray-Turi criterion computation will be plotted on a graph
The elbow will be manually defined on the previously built plot for finding the relevant k.
Each cluster relates to a type of patient communication. This step allows reducing the patient communication profiles from the number of patients included into the cohort (more than 300,000 if we consider patients with diabetes) to a small one (at most less than a few dozen).
The second data mining goal is to generate a hierarchical clustering of the previously discovered clusters to allow understanding the similarities and dissimilarities between the communication patterns.
Descriptive statistics of sociodemographic, bio-medical, and communication data will be generated for each cluster.
On the basis of the previously built k clusters of “Communication/contacts with the HMO data” of 2016 and the related hierarchical clustering, we will generate descriptive statistics for each patient communication profiles (ie, cluster or set of patients) over the years (2008-2015).
Information Visualization
To provide user-friendly tools to decision and policy makers [45], allowing them to understand the different patient communication profiles and the strengths and weaknesses of each one, we will build heat maps for each year between 2008 and 2016 based on the previously generated hierarchical clustering of 2016 data.
Process Mining
Furthermore, we plan to implement algorithms and approaches from the field of process mining [46] to identify the changes in communication profiles over time, which may be the cause of treatment adherence changes. For example, process mining will allow us to model how patients with a similar communication profile (ie, patients within the same cluster) have changed their communication patterns with the HMO using the following channels:
Consulting with physicians and/or nurses
Scheduling appointments by using 1 or more of the following channels: through a medical secretary—data available since 2009, call center—data available since 2009, website—since 2011, or mobile app—since 2012
Overall interaction with the HMO (using the overall services).
Qualitative Research
Qualitative research of focus groups is the most effective means to fully understand factors that encourage or delay the use of communication interfaces with the health care organization. Focus groups enable the collection of information from a multicultural population [47] and discussion of new ideas that do not arise during personal interviews [48]. We designed the qualitative part of the proposed study based on the guidelines presented by King et al [49]. The qualitative part of the research will include between 1 and 8 focus groups depending on their usage level of the communications channels with Clalit. Each one of the focus groups will include up to 8 patients from the same area. Participants in the focus groups will be asked to complete a short sociodemographic questionnaire and sign an informed consent form. During the focus group meeting, the group facilitator will record the discussion and make important notes related to the participants’ nonverbal communication.
A guideline questionnaire for the focus groups will be constructed with the assistance of experts in the field and relevant literature. This questionnaire will evaluate factors that encourage or delay the use of communication channels with Clalit. The guiding questionnaire will include up to 10 open questions that will facilitate responses providing critical information, for example, “What factors contribute or will contribute to your use of the communication channel X?”; “What factors delay or will delay your usage of communication channel X?”; or “How do you think that communication channel X can be improved?”. The guiding questionnaire will be used to explore aspects that are relevant for better understanding the topic and will facilitate expanding the discussion to areas that the participants consider to be most significant.
The discussions in the focus groups will be recorded and transcribed. The transcripts of the focus group discussion will be analyzed in a phenomenological approach that emphasizes the patient’s unique and subjective perception through qualitative content analysis [50]. The coding process will begin with open coding (ie, identification of major categories), following by axial coding that results from 1 core phenomenon. Next, the data will be categorized according to this core phenomenon [51] and will be reviewed by external domain experts to ensure objectivity [49]. Sandelowski [52] notes that through qualitative content analysis, researchers can add new information to the existing one and gain new insights. The encoding and analysis will be performed by the principal investigators and the associate investigators, with the same encoding rules for guaranteeing homogeneous and consistent encoding [49]. In cases of disagreement regarding the encoding, an expanded forum will be held in which the majority decision prevails.
Results
This project was funded in 2016, and the research project is scheduled to be completed in 2019.
A preliminary analysis has been performed on the data of the year 2015 related to 309,460 patients with diabetes in 2015, aged 32 years and above, having the disease treated by Clalit for more than 7 years. Overall, 7 main communication patterns have been discovered.
The first cluster is of patients with relatively low contacts with the HMO in comparison with the rest of the population. Patients in these 2 groups tend to be relatively young (median age: 64 years) and less morbid (ACG between 3 and 4). Although patients in the first group tend to have a poor follow-up quality, 21.21% (18,779/88,524) of the patients were missing BMI measurement and 23.09% (20,436/88,524) were missing their HbA1c measurement in 2015; patients in the second cluster have an average follow-up quality: only 7.72% (6228/80,714) of the patients did not perform a BMI measurement and only 10.56% (8527/80,714) did not perform a HbA1c measurement. A possible explanation for this difference may be related to the tendency of the patients in the second group to resort mainly to human contact (face-to-face or by phone).
The next 2 clusters are of early adopters of technology. These diabetic patients interacted in 2015 with Clalit mainly through new digital platforms: the website (first group) or the mobile app (the second group). These patients also tend to use lesser medical services compared with the rest of the population, and their follow-up quality was better than the rest of the population: only 4.64% (1212/26,098) and 6.10% (1593/26,098) of the first group did not perform BMI and HbA1c tests in 2015, respectively, whereas 5.05% (603/11,945) and 6.93% (826/11,945) of the second group did not perform BMI and HbA1c tests in 2015, respectively.
The patients included in the fifth cluster are mainly using nursing services. They also tend not to schedule appointments. This subpopulation has a low SES (40.79%, 14,531/35,624). However, the follow-up of these patients is quite good (with 3.17% [1128/35,624] and 6.05% [2155/35,624] of these patients missing their BMI and HbA1c measurements, respectively). This is a clear effect of the nursing personnel involvement.
Patients in the last 2 clusters tend to be older than the rest of the patient population (aged more than 70 years) and with relatively high morbidity (ACG=5). Patients in the sixth cluster tend to be consumers of medical services that involve access to a human being, whereas patients in the seventh cluster tend to be heavy users of all medical services. They also tend to have one of the best follow-up rates: only 1.64% (825/38,070) and 4.38% (1668/38,070) of the patients in the sixth cluster have missed their BMI and HbA1c measurements, respectively, in 2015, whereas only 4.22% (1203/28,485) and 6.40% (1822/28,485) of the patients in the seventh cluster have missed their BMI and HbA1c measurements, respectively, in 2015.
Discussion
Overview
This research protocol deals with the identification of patient communication profiles. This knowledge will help the health care organization to increase the accessibility of patients to the services the health care organization provides and to improve patients’ engagement with the treatment process. This, in turn, may motivate the patient to achieve a better treatment adherence, improve quality of care, and generate better patient experience.
Expected Results and Future Directions
Analysis of communication patterns over time may reveal long-term behavior patterns as well as identify patterns at a higher abstraction level (eg, early adopters of technology and early adopters of services). It should be noted that the research is planned to be performed on data from a period that witnessed a significant yet gradual change in the communication channels Clalit provides its patients. Analyzing the response of the patient population to these changes will hopefully help improve the available communication channels as well as assist in formulating realistic expectations from the introduction of new communication channels, taking into consideration also the sociodemographic characteristics and clinical constraints as well as their previous communication patterns with the HMO.
By tuning its communication tools to patients’ preferences (eg, by translating the user interfaces of the electronic communications tools—website or apps—from Hebrew to other languages such as Arabic, English, Russian, Amharic, French, and Spanish), the health organization would (1) improve and increase accessibility to health care services, achieve better patient engagement and responsiveness to treatment, and improve quality of treatment and treatment experience within existing budgetary constraints and (2) increase patients’ engagement with the treatment process by transforming the communication scheme with each patient to a more proactive scheme, so as to better fit their profile.
Strengths and Limitations
Clalit insured and provided medical services to approximately 4.53 million patients in 2015 and is the largest health care provider in Israel. The data available spans all treatment providers including hospitals’ end emergency units. Nevertheless, overall ethnic distribution of the Clalit population does not fully reflect the overall Israeli demographic composition. The Clalit members comprise, in comparison with the Israeli general population, (1) a higher proportion of Arabs and a lower proportion of ultra-orthodox members and (2) a higher proportion of members having a low SES.
Another potential limitation is the decision to analyze only patients with diabetes. These patients may exhibit behaviors that are unique to this specific chronic disease and may not be shared by other chronic patients. Nevertheless, diabetes is 1 of the most common chronic diseases, with prevalence of approximately 7% within Clalit’s insured population.
Finally, this research is conducted on data of Israeli patients. The structure of the Israeli health care system as well as Israeli culture and norms may affect patients’ behavior and may not apply to patients in other geographical locations.
Acknowledgments
The research was supported by a grant from the Israel National Institute for Health Policy (#188-15).
Abbreviations
- ACG
adjusted clinical groups
- BMI
body mass index
- Clalit
Clalit Health Services
- CRI
Clalit Research Institute
- DWH
data warehouse
- EMR
electronic medical record
- HMO
health maintenance organization
- HbA 1c
glycated hemoglobin
- KDD
knowledge discovery in databases
- SES
socioeconomic status
- SMS
short message service
Footnotes
Conflicts of Interest: None declared.
References
- 1.Axén I, Bodin L, Bergström G, Halasz L, Lange F, Lövgren PW, Rosenbaum A, Leboeuf-Yde C, Jensen I. Clustering patients on the basis of their individual course of low back pain over a six month period. BMC Musculoskelet Disord. 2011 May 17;12:99. doi: 10.1186/1471-2474-12-99. https://bmcmusculoskeletdisord.biomedcentral.com/articles/10.1186/1471-2474-12-99 .1471-2474-12-99 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rai A, Chen L, Pye J, Baird A. Understanding determinants of consumer mobile health usage intentions, assimilation, and channel preferences. J Med Internet Res. 2013 Aug 2;15(8):e149. doi: 10.2196/jmir.2635. http://www.jmir.org/2013/8/e149/ v15i8e149 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hoffman AS, Volk RJ, Saarimaki A, Stirling C, Li LC, Härter M, Kamath GR, Llewellyn-Thomas H. Delivering patient decision aids on the internet: definitions, theories, current evidence, and emerging research areas. BMC Med Inform Decis Mak. 2013;13 Suppl 2:S13. doi: 10.1186/1472-6947-13-S2-S13. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-S2-S13 .1472-6947-13-S2-S13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Beck F, Richard J, Nguyen-Thanh V, Montagni I, Parizot I, Renahy E. Use of the internet as a health information resource among French young adults: results from a nationally representative survey. J Med Internet Res. 2014 May 13;16(5):e128. doi: 10.2196/jmir.2934. http://www.jmir.org/2014/5/e128/ v16i5e128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Moick M, Terlutter R. Physicians' motives for professional internet use and differences in attitudes toward the internet-informed patient, physician-patient communication, and prescribing behavior. Med 2 0. 2012 Jul 6;1(2):e2. doi: 10.2196/med20.1996. http://www.medicine20.com/2012/2/e2 .v1i2e2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kritz M, Gschwandtner M, Stefanov V, Hanbury A, Samwald M. Utilization and perceived problems of online medical resources and search tools among different groups of European physicians. J Med Internet Res. 2013 Jun 26;15(6):e122. doi: 10.2196/jmir.2436. http://www.jmir.org/2013/6/e122/ v15i6e122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dugdale DC, Epstein R, Pantilat SZ. Time and the patient-physician relationship. J Gen Intern Med. 1999 Jan;14 Suppl 1:S34–40. doi: 10.1046/j.1525-1497.1999.00263.x. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1496869/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Weiner JP. Doctor-patient communication in the e-health era. Isr J Health Policy Res. 2012 Aug 28;1(1):33. doi: 10.1186/2045-4015-1-33. http://www.ijhpr.org/content/1/1/33 .2045-4015-1-33 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Peleg R, Avdalimov A, Freud T. Providing cell phone numbers and email addresses to patients: the physician's perspective. BMC Res Notes. 2011 Mar 23;4:76. doi: 10.1186/1756-0500-4-76. https://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-4-76 .1756-0500-4-76 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Peleg R, Nazarenko E. Providing cell phone numbers and e-mail addresses to patients: the patient's perspective, a cross sectional study. Isr J Health Policy Res. 2012 Aug 28;1(1):32. doi: 10.1186/2045-4015-1-32. https://ijhpr.biomedcentral.com/articles/10.1186/2045-4015-1-32 .2045-4015-1-32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Henao R, Murray J, Ginsburg G, Carin L, Lucas JE. Patient clustering with uncoded text in electronic medical records. AMIA Annu Symp Proc. 2013;2013:592–9. http://europepmc.org/abstract/MED/24551361 . [PMC free article] [PubMed] [Google Scholar]
- 12.Sewitch MJ, Leffondré K, Dobkin PL. Clustering patients according to health perceptions: relationships to psychosocial characteristics and medication nonadherence. J Psychosom Res. 2004 Mar;56(3):323–32. doi: 10.1016/S0022-3999(03)00508-7.S0022399903005087 [DOI] [PubMed] [Google Scholar]
- 13.Webster C. EHR business process management: from process mining to process improvement to process usability. Healthcare Systems Process Improvement Conference; February 20, 2012; Las Vegas, USA. 2012. Feb 20, http://wareflo.com/HSPI2012/ehr-bpm-process-mining-webster-2012-shs-conf.pdf . [Google Scholar]
- 14.Gerteis J, Izrael D, Deitz D, LeRoy L, Ricciardi R, Miller T, Basu J. Multiple Chronic Conditions Chartbook. Rockville, MD: Agency for Healthcare Research and Quality; 2014. Apr, Healthcare Utilization and Costs; pp. 7–14. [Google Scholar]
- 15.Rennert G, Peterburg Y. Prevalence of selected chronic diseases in Israel. Isr Med Assoc J. 2001 Jun;3(6):404–8. http://www.ima.org.il/IMAJ/ViewArticle.aspx?year=2001&month=06&page=404 . [PubMed] [Google Scholar]
- 16.Karpati T, Cohen-Stavi CJ, Leibowitz M, Hoshen M, Feldman BS, Balicer RD. Towards a subsiding diabetes epidemic: trends from a large population-based study in Israel. Popul Health Metr. 2014 Oct 30;12(1):32. doi: 10.1186/s12963-014-0032-y. https://pophealthmetrics.biomedcentral.com/articles/10.1186/s12963-014-0032-y .32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Israel Center for Disease Control. Israel: Ministry of Health; 2017. Highlights of Health in Israel 2016 https://www.health.gov.il/publicationsfiles/highlights_of_health_in_israel2016.pdf . [Google Scholar]
- 18.Fayyad U, Piatetsky-Shapiro G, Smyth P. Advances in knowledge discovery and data mining. CA, USA: American Association for Artificial Intelligence; 1996. From data mining to knowledge discovery: an overview. [Google Scholar]
- 19.Fayyad U, Piatetsky-Shapiro G, Smyth P. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM. 1996 Nov;39(11):27–34. doi: 10.1145/240455.240464. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.2315&rep=rep1&type=pdf . [DOI] [Google Scholar]
- 20.Frawley WJ, Piatetsky-Shapiro G, Matheus Cj. Knowledge discovery in databases: an overview. AI Mag. 1992 Sep;13(3):57–70. doi: 10.1609/aimag.v13i3.1011. [DOI] [Google Scholar]
- 21.Zaki MJ, Meira Jr W. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge: Cambridge University Press; 2014. May, [Google Scholar]
- 22.Mitchell TM. Machine Learning. New York, NY, USA: McGraw-Hill, Inc; 1997. [Google Scholar]
- 23.National Heart Lung and Blood Institute. [2018-10-11]. The Practical Guide Identification, Evaluation, and Treatment of Overweight and Obesity in Adults https://www.nhlbi.nih.gov/files/docs/guidelines/prctgd_c.pdf .
- 24.Reid RJ, Roos NP, MacWilliam L, Frohlich N, Black C. Assessing population health care need using a claims-based ACG morbidity measure: a validation analysis in the Province of Manitoba. Health Serv Res. 2002 Oct;37(5):1345–64. doi: 10.1111/1475-6773.01029. http://europepmc.org/abstract/MED/12479500 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.WHO Collaborative Centre for Drug Statistics Methodology. 2017. [2018-10-11]. Guidelines for ATC classification and DDD assignment 2018 https://www.whocc.no/filearchive/publications/guidelines.pdf .
- 26.Ben-Gal I. Outlier Detection. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. New York, NY: Springer; 2010. pp. 117–30. [Google Scholar]
- 27.Leys C, Ley C, Klein O, Bernard P, Licata L. Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol. 2013 Jul;49(4):764–66. doi: 10.1016/j.jesp.2013.03.013. [DOI] [Google Scholar]
- 28.Bansal R, Gaur N, Singh S. Outlier Detection: Applications and techniques in Data Mining. The 6th International Conference - Cloud System and Big Data Engineering (Confluence); January 14-15, 2016; Noida, India. 2016. https://ieeexplore.ieee.org/document/7508146 . [DOI] [Google Scholar]
- 29.Rahm E, Do HH. Data cleaning: problems and current approaches. Bull Tech Comm Data Eng. 2000 Jan;23(4):3–13. https://pdfs.semanticscholar.org/6046/770d1c3e08edfdd39bdb57fccaca84f5139c.pdf . [Google Scholar]
- 30.Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques. Canada: Elsevier Science; 2011. [Google Scholar]
- 31.Escalante HJ. Semanticscholar. 2005. [2018-10-11]. A Comparison of Outlier Detection Algorithms for Machine Learning https://pdfs.semanticscholar.org/cf06/9b7460ce1b5a0434a6a19f420544a780f35d.pdf .
- 32.Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv. 2009 Jul 1;41(3):1–58. doi: 10.1145/1541880.1541882. [DOI] [Google Scholar]
- 33.Ahmad T, Pencina MJ, Schulte PJ, O'Brien E, Whellan DJ, Piña IL, Kitzman DW, Lee KL, O'Connor CM, Felker GM. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J Am Coll Cardiol. 2014 Oct 28;64(17):1765–74. doi: 10.1016/j.jacc.2014.07.979. https://linkinghub.elsevier.com/retrieve/pii/S0735-1097(14)05897-5 .S0735-1097(14)05897-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Fodor IK. A survey of dimension reduction techniques. United States of America: Lawrence Livermore National Lab, CA (US); 2002. May 9, [Google Scholar]
- 35.Cunningham P. University College Dublin. 2007. Dimension Reduction http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.1478&rep=rep1&type=pdf.
- 36.World Health Organization . Use of Glycated Haemoglobin (HbA1c) in the Diagnosis of Diabetes Mellitus: Abbreviated Report of a WHO Consultation. Geneva: World Health Organization; 2011. [PubMed] [Google Scholar]
- 37.Rokach L. A survey of clustering algorithms. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook, 2nd ed. New York, NY: Springer; 2010. pp. 269–98. [Google Scholar]
- 38.MacQueen JB. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; 1967; Berkeley, California. 1967. pp. 281–97. [Google Scholar]
- 39.Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; August 2-4, 1996; Portland, Oregon. 1996. pp. 226–31. https://www.aaai.org/Library/KDD/kdd96contents.php . [Google Scholar]
- 40.Ankerst M, Breunig M, Kriegel H, Sander J. Optics: ordering points to identify the clustering structure. SIGMOD '99 Proceedings of the ACM SIGMOD international conference on Management of data; ACM SIGMOD international conference on Management of data; May 31-June 3, 1999; New York. New York, NY, USA: ACM; 1999. pp. 49–60. https://dl.acm.org/citation.cfm?id=304187 . [Google Scholar]
- 41.Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform. 2004;1(1):24–45. doi: 10.1109/TCBB.2004.2. [DOI] [PubMed] [Google Scholar]
- 42.Zhao H, Wee-Chung Liew A, Wang DZ, Yan H. Biclustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr Bioinform. 2012 Mar 1;7(1):43–55. doi: 10.2174/157489312799304413. [DOI] [Google Scholar]
- 43.Aggarwal C, Han J, editors. Frequent Pattern Mining. Switzerland: Springer International Publishing; 2014. [Google Scholar]
- 44.Ray S, Turi R. Determination of number of clusters in k-means clustering and application in colour image segmentation. The 4th international conference on advances in pattern recognition and digital techniques; 1999; Calcutta, India. 1999. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.587.3517 . [Google Scholar]
- 45.Card S, Mackinlay J, Shneiderman B, editors. Readings in Information Visualization: Using Vision to Think. USA: Elsevier Science; 1999. [Google Scholar]
- 46.van der Aalst W. Process Mining. Berlin Heidelberg: Springer; 2016. [Google Scholar]
- 47.Culley L, Hudson N, Rapport F. Using focus groups with minority ethnic communities: researching infertility in British South Asian communities. Qual Health Res. 2007 Jan;17(1):102–12. doi: 10.1177/1049732306296506. [DOI] [PubMed] [Google Scholar]
- 48.Kidd PS, Parshall MB. Getting the focus and the group: enhancing analytical rigor in focus group research. Qual Health Res. 2000 May;10(3):293–308. doi: 10.1177/104973200129118453. [DOI] [PubMed] [Google Scholar]
- 49.King G, Verba S, Keohane RO. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press; 1994. [Google Scholar]
- 50.Stewart DW, Shamdasani PN. Focus Groups: Theory and Practice (Applied Social Research Methods), 3rd ed. USA: SAGE Publications; 2018. [Google Scholar]
- 51.Lewis S. Qualitative inquiry and research design: choosing among five approaches. Health Promot Pract. 2015 Apr 2;16(4):473–75. doi: 10.1177/1524839915580941. [DOI] [Google Scholar]
- 52.Sandelowski M. Whatever happened to qualitative description? Res Nurs Health. 2000 Aug;23(4):334–40. doi: 10.1002/1098-240x(200008)23:4<334::aid-nur9>3.0.co;2-g.10.1002/1098-240X(200008)23:4<334::AID-NUR9>3.0.CO;2-G [DOI] [PubMed] [Google Scholar]