Abstract
Background: We aimed to identify the indicators of healthcare fraud and abuse in general physicians’ drug prescription claims, and to identify a subset of general physicians that were more likely to have committed fraud and abuse.
Methods: We applied data mining approach to a major health insurance organization dataset of private sector general physicians’ prescription claims. It involved 5 steps: clarifying the nature of the problem and objectives, data preparation, indicator identification and selection, cluster analysis to identify suspect physicians, and discriminant analysis to assess the validity of the clustering approach.
Results: Thirteen indicators were developed in total. Over half of the general physicians (54%) were ‘suspects’ of conducting abusive behavior. The results also identified 2% of physicians as suspects of fraud. Discriminant analysis suggested that the indicators demonstrated adequate performance in the detection of physicians who were suspect of perpetrating fraud (98%) and abuse (85%) in a new sample of data.
Conclusion: Our data mining approach will help health insurance organizations in low-and middle-income countries (LMICs) in streamlining auditing approaches towards the suspect groups rather than routine auditing of all physicians.
Keywords: Healthcare, Fraud, Abuse, Insurance, Data Mining, General Physician
Background
Healthcare expenditure is rapidly rising in many countries. Globally around 10% of the gross domestic product of the countries was spent on health in 2011.1 Unfortunately, not all of this money is spent in the right place. There are many sources of inefficiency. An important fraction of this money – up to 10% of total health expenditure – is wasted because of fraud and abuse, amounting to billions of dollars per year.2
Fraud has been defined as an intentional deception or misrepresentation made by a person or an entity, with the knowledge that the deception could result in some kinds of unauthorized benefits to that person or entity.3 The term ‘‘abuse’’ may be used to describe problematic behavior of a physician or healthcare organization which is not clearly against the law or where certain elements of the fraud definition (such as knowing deception) are missing.4,5 Abuse is the closest concept to fraud and usually accompanies it. Nevertheless, fraud boundaries are confused with abuse and also to some extent with unprofessional behavior, negligence and corruption.5
Healthcare fraud can be classified into categories of provider fraud, consumer fraud (patient or insured), and insurer or payer fraud.6 Provider healthcare fraud may be committed by individuals (eg, physicians, dentists) or by provider organizations (eg, hospitals). Sometimes providers engage in fraudulent behaviors that involve other service providers (eg, diagnostic services) or pharmaceutical and medical device manufacturers by receiving kickback payments. Provider related fraudulent behaviors may also involve other groups, eg, patients or insurer representatives.5,6
Interventions to combat healthcare fraud and abuse can be classified into the 3 categories of interventions aimed at preventing, detecting, and responding to fraud and abuse.5 The focus of this paper is on the interventions designed to detect fraud and abuse in physician behavior. Such interventions involve identifying past and new cases of fraud as quickly as possible after a fraud has been committed.
Traditional methods of detecting healthcare fraud and abuse are based on auditing procedures that are often time-consuming and practically ineffective. Such paper-based claim handling is still the dominant picture in many low- and middle-income countries (LMICs).7,8 Often thousands of healthcare claims are handled by few auditors who are expected to review all the claims. In reality, they have little time for each claim, focusing on special characteristics of a claim without paying attention to the relationships between all the variables that provide a comprehensive picture of a physician behavior.
Certain physician payment methods act as a risk factor for abuse, and perhaps fraud, in healthcare. Under fee-for-service payment systems where a third-party payer exists, physicians have motivation to increase the number of services.9 This might result in the provision of substandard care and unnecessary services that have been categorized as examples of abuse.5,10 If these payment mechanisms are accompanied with ineffective auditing procedures, which are frequently observed in LMICs, then health resource loss and potential harms to patients might be substantial.
Data mining, as a key part of ‘knowledge discovery from databases’ (KDD), involves the use of methods that explore the data, develop relevant models and discover previously unknown patterns in the data.11 Data mining can help third-party payers such as health insurance organizations to extract useful knowledge from thousands of claims and identify a smaller subset of the claims or claimants for further assessment and scrutiny for fraud and abuse. This way, the data mining approaches are part of a more efficient and effective auditing system.12
Objectives
Our study aimed to demonstrate how data mining can be used in a real context of a health insurance organization in a middle-income country. We applied data mining approaches to general physician’s outpatient claims submitted to the Social Security Organization (SSO) in Iran. We aimed to identify indicators of healthcare fraud and abuse in general physicians’ drug prescription claims, and to identify a subset of general physicians that were more likely to have committed fraud and abuse.
The paper is organized as the following. We first provide a brief review of the literature about previous published studies that applied data mining to detect fraud and abuse in healthcare. Then we explain the role of the SSO in purchasing healthcare services in Iran. In the methods section, we describe the setting and the 5 general steps we followed to perform the data mining processes. Finally, we demonstrate the findings, explain their implications and discuss the limitations and advantages of our methods.
Review of Literature
Computerized systems that process pre-defined simple rules are increasingly implemented for identifying errors and inappropriate billings, such as erroneous or incomplete data input, duplicate claims and ineligible claims. These systems, as they rely on simple rules, may not have the capabilities of modeling abusive and fraudulent behavior.6 They are also often unsuccessful in detecting fraudulent claims that are backed up with expected documentations, and in detecting new patterns of fraud and abuse.
More sophisticated antifraud systems are based on combining statistical methods and machine learning and are generally known as KDD. They involve several steps starting from understanding the setting and environment, setting clear objectives, understanding the data and the nature of the claims, cleaning, preparation and transformation of the data, selecting appropriate data mining approaches, conducting data mining algorithms, and evaluation and interpretation of the findings.11
Data mining techniques used in healthcare fraud detection are divided to 2 general approaches of ‘supervised’ and ‘unsupervised’ methods.13 Supervised data mining usually involve methods that use samples of previously known fraudulent and non-fraudulent records. The records are then used to construct models which allow assigning a new observation into one of the 2 groups of records. Supervised methods require confidence in the correct categorization of the records. Furthermore, they are useful in detecting previously known patterns of fraud and abuse. Hence, the models should be regularly updated to reflect ‘innovations’ in fraudulent behaviors and changes in the regulations and settings. Liou et al14 used supervised methods to review claims submitted to Taiwan’s National Health Insurance for diabetic outpatient services. They compared 3 data mining methods including logistic regressions, neural networks, and classification trees for the detection of fraudulent or abusive behavior.13 As another example, one study used supervised data mining to see whether the providers followed the previously defined clinical pathways. They hypothesized that deviations from clinical pathways might be an indication of potential fraudulent or abusive behavior.15
In contrast, unsupervised approaches do not require a prior knowledge of fraudulent behavior. They often involve comparing individual claims with the norms observed in the sample of claims under analysis. Unsupervised methods typically include segmentation techniques (such as clustering methods and anomaly detection) and association rules mining methods.6,13 Some examples of applying unsupervised methods are presented below.8 One study applied clustering methods on general physician’s practice data of the national health insurance in Taiwan.16 They used ten indicators (features or attributes) to cluster physicians’ practice data. They ranked critical clusters using expert opinions about importance of clusters on health expenditures. Finally, they illustrated managerial guidance based on expert opinions about characteristics of each critical cluster.16 Another study applied a 2-phase model to identify abusive internal medicine outpatient’s clinics in Korea.17 This study gathered data from practitioner outpatient care claims submitted to a health insurance organization. They calculated a risk score by using 38 indicators indicating the degree of abuse likelihood of the providers, and then classified providers using a decision tree.17 Shan et al18 applied a local density based outlier detection method on optometrists’ claims drawn from Medicare Australia. They validated the results based on the optometrists’ compliance history and feedback from experts.18 In another study, association rule mining was applied to examine billing patterns of a group of specialist physicians to detect potential fraudulent behaviors. They identified association rules from specialist billing records and the specialists whose claims frequently broke the rules were identified as potentially at high risk of fraud and abuse.19
Applying supervised models requires knowledge of whether the claims are fraudulent or abusive or not in order to build the analytical models. Usually medical insurance organizations do not have such information about the individual claims. As another limitation of supervised methods, it is argued that as soon as fraudsters become aware of a particular detection method, they try adapting their strategies to avoid detection.20 Unsupervised approaches, in theory, can be applied to identify new types of fraud or abuse which might not have been previously documented, and hence is relatively immune to both previous concerns.
There are also hybrid methods of combining unsupervised and supervised methods. As an example, one study conducted a 3 steps methodology for insurance fraud detection. They applied unsupervised clustering methods on insurance claims and developed a variety of (labeled) clusters. Then they used an algorithm based on a supervised classification tree and generated rules for the allocation of each record to clusters. Then they identified the most effective ‘rules’ for future identification of abusive behaviors.21
Methods
Setting
There are no estimates of fraud and abuse losses in Iran healthcare system.22 Still international evidence suggests that it would be an important fraction of healthcare costs. Iran has 3 major social health insurers: Iran Health Insurance Organization (formerly known as the Medical Services Insurance Organization), SSO and the Military Forces Social Security Organization. SSO implements a compulsory coverage of formal sector workers and their families, and a voluntary coverage for self-employed persons and their families. SSO covered over 33% of Iran population in 2010 (about 25 million of Iran’s population).23
SSO is financed by contributions of the insured, employer and the government, paying equivalent 7%, 20%, and 3% of the insured monthly salary, respectively. Exemptions and premium caps also apply. After the Ministry of Health (MoH), SSO is the second provider of treatment services with a network of 70 hospitals (about 9% of all hospitals) and about 280 clinics and polyclinics around the country.24 Thirty percent of SSO resources are allocated to theses centers.24 The major part of SSO financial resources are allocated to purchasing curative services from private and governmental healthcare institutions based on a fee-for-service payment approach. In 2012, the SSO had contractual agreements with over 4500 clinics and health centers, 700 hospitals and 27000 physician and dentist practices.24 From April 2011 to March 2012, the SSO processed 324 million outpatient and 3.4 million inpatients claims.24
Despite a huge amount of paper claims handled, and a large amount of money spent, for purchasing curative services by SSO and the other insurance organizations in Iran, the auditing system is mainly manual and retrospective. Auditors are expected to review all of the received claims, which practically is time consuming and ineffective. In 2008, SSO started a new program to replacing paper claims with electronic claims. Para-clinic centers such as private pharmacies and laboratories deliver electronic claims to SSO. General physicians and specialists working in private offices still deliver paper claims to SSO. The growing use of electronic claims is an opportunity to apply data mining approaches for improving auditing procedures.
Data
The data used for this study originate from the insured patients’ visits of the physicians and physicians’ drug prescriptions, as we describe here. An insured patient carries an insurance logbook with them when they visit a physician. The physician writes and signs the prescription orders, if needed, on a specified page of the logbook. Then the patient takes the signed ‘prescription’ to the pharmacy to be dispensed, and pays the coinsurance rate of about 30% of the prescription costs. The pharmacy submits the dispensed ‘prescription’ to the insurer for the reimbursement of the remaining 70% of the costs. These prescriptions are then collected by the insurer in computerized files. We received the computerized data of drug prescriptions from the SSO for this study.
Design
We used data mining tasks to detect probable fraudulent and abusive behavior of general physicians working in solo private practices.
This study was conducted in 5 steps. The first step was to understand the nature of the problem and clarify the study objectives. The second step focused on the understanding and preparation of data and involved a large amount of work to prepare the data for data mining. In the third step, we identified and selected the indicators (features, attributes). In the forth step, we applied clustering methods to identify a critical cluster of physicians. Finally, in the fifth step, we applied discriminant analysis to build a predictive model for determining group membership of general physicians based on new data. We interpreted the results by using expert’s viewpoints and providing policy implications of the results.
Step 1: Understanding the Nature of the Problem
Talking to health insurance managers, they were concerned with 2 types of healthcare fraud and abuse among general physicians that were particularly time-consuming to detect.
The first behavior of concern was an abuse pattern: a physician might prescribe more medicines than needed, including injectable medicines, antibiotics, corticosteroids or expensive medicines, to keep the patients satisfied with the physician. This way the physician might attract more patients and indirectly increase their revenue. Also the physician might prescribe injectable medicines, and take a benefit from the injection service that might exist besides the physician office.
The second concern was about a fraudulent behavior that involved collusions between a physician and a pharmacy: The fraudster physician may remove blank prescription pages from the patient’s logbook, while the patient is unaware. This is usually additional to the genuine drug prescription order that is written in the logbook and handed over to the patient. Later on, the physician and the colluding pharmacy add (expensive) medicines to the blank page and produce a bogus prescription order. Then the pharmacy submits the bogus drug prescription claim to the insurer for reimbursements and splits the fraud money with the colluding physician.
Step 2: Understanding the Data and Data Preparation
We collected data on the provincial branch of the SSO in the Lorestan province, Iran that contracted 454 general physicians, specialists and dentists and 283 para-clinic centers (eg, pharmacies, laboratories, x-rays, physiotherapy offices, and so on in private sector).24 The SSO covered about 26% of the Lorestan population (over 450000 people).23
We collected claim data on all contracted general physicians (ie, 205 general physicians) that included 612804 outpatient drug prescription claims in year 2011. We also collected summary physician activities and specification data, and cross-checked the summary reports with the prescription details to ensure data completeness. Examples of available data about physicians and claims are provided in Table 1 and Table 2.
Table 1. Examples of Physician Raw Data Based on Provided Data by SSO Local Branch in Lorestan Province, Iran .
Physician Identifier | 1 | 2 |
Gender | Male | Female |
City or district | Khoram abad | Doroud |
Number of months claims delivered | 12 | 11 |
Number of visit claims (yearly) | 12 000 | 4152 |
Total annual costs of visit claims | 268 100 000 | 93 336 600 |
Number of prescription drug claims | 13 450 | 3607 |
Total annual cost of prescription claims | 320 125 727 | 68 724 149 |
Abbreviation: SSo, Social Security Organization.
Table 2. Examples of Physicians’ Prescribing Claim Raw Data Based on Provided Data by the SSO on Lorestan Province, Iran .
Drug Prescription Identifier | 123456 | 123456 | 123456 | 123456 | 123457 | 123457 |
Pharmacy identifier | 35 | 35 | 35 | 35 | 87 | 87 |
Month of the year | 5 | 5 | 5 | 5 | 11 | 11 |
Physician identifier | 1 | 1 | 1 | 1 | 5 | 5 |
Patient logbook identifier | 1234567890 | 1234567890 | 1234567890 | 1234567890 | 9876543210 | 9876543210 |
number of prescription items in a claim | 4 | 4 | 4 | 4 | 2 | 2 |
Total cost of a claim | 437000 | 437000 | 437000 | 437000 | 23000 | 23000 |
Code of individual drug on the prescription | 10 | 958 | 400 | 525 | 80001 | 82950 |
Number of drugs requested | 20 | 1 | 1 | 1 | 1 | 1 |
The cost of the individual drug | 14500 | 16500 | 12000 | 7000 | 8000 | 15000 |
Abbreviation: SSo, Social Security Organization.
We linked physician data and claim data using unique physician codes. Instead of "claims," we selected "physicians" as the unit of analysis as we were interested in identifying physicians that were suspicious of conducting fraud or abuse.
Physician behavior in the public sector, where there is less incentive for fraud and abuse due to a lack of personal gain, might be different with their behavior in the private sector. We identified and omitted 40 physicians from the analyses as they worked in private and public sectors simultaneously, and omitted 1 physician because of lack of data. We organized the data in 2 datasets. One included physician level variables and indicators on 164 general physicians (Table 1). The second dataset included 474897 drug prescription claims (Table 2). Other data preparation tasks included finding and correcting errors, managing missing values, restructuring the data (for example restructuring tabular data to a transactional structure), computing and recoding variables and anonymizing physicians’ personal data. Patient’s age and gender were also omitted from the datasets because of a high percentage of missing values.
As fraud detection is about finding outliers and extreme cases, we did not omit or manipulate outliers or extremes. When we observed unusual cases, we investigated the case separately. We did not use statistical methods to manage missing values. If a variable or record had too many missing value, we omitted the variable or record.
Step 3: Indicators (Features) Creation and Selection
The fundamental logic of applying statistical methods and data mining for detecting fraud and abuse in healthcare claims is that fraudulent behavior of physicians leaves footprints. The success of data mining process is dependent on the creation and selection of indicators (features or attributes) representing the symptoms of fraud and abuse in claims. Distinguishing reliable bits of information from "fool’s gold" depends on selecting appropriate indicators based on a good understanding of the context.
The indicators were created following a logical inference about fraudulent behavior of physicians and were validated by expert interviews. "Logical inference" referred to question like that: if a physician wants to behave fraudulently and submit false claim to SSO, what he most likely do? We considered all possible ways and discussed them with experts. See an example of such an inference in Box 1. We conducted interviews with 15 individuals including claim auditors (8 interviewees), managers (5 interviewees) and physicians (2 interviewees) at the provincial and national levels of SSO for validating the indicators. All of the interviewees had at least ten years related work experience. All managers were also general physicians. At the same time, we selected indicators from similar studies.7,14
Box 1: An example of a "logical inference" to identify the potential patterns of fraudulent behaviors
"A physician might collide with a pharmacy in fraudulent behavior. The physician might obtain an empty page of a patient’s insurance logbook, fill it in with prescription that will not be dispensed by the pharmacy, and the pharmacy would claim such a fraudulent prescription from the insurer. One would expect such collisions will involve a link between the doctor and few pharmacies to reduce the chances of identification. Hence, repeated dispensing patterns will be of interest. Also it might involve prescribing more expensive medicines. Additionally the fraudster would try to reduce their chances of being identified. For example they might ensure that such prescriptions include no more than 3 medicinal items, as insurance organizations are more likely to assess a prescription that includes 4 or more items…."
Step 4: Clustering
We selected 92% of claims (436622 claims) covering 11 months of the study year. We calculated the indicators’ values for each physician and standardized indicators values using Z scores. We then used a hierarchical clustering method to segment physicians and identifying clusters of physicians that were suspect for abuse and fraud. The analysis involved the calculation of the distance between the data points and clusters (Euclidian distance measures). The optimal number of clusters was determined using the maximum value of the silhouette coefficient (a clustering validity index).16 In this process, the optimum number of clusters is established based on the maximum value of the silhouette coefficient. The data points were clustered so that the physicians within a cluster tended to be similar to each other, but dissimilar to the physicians in other clusters. In this process, we did not have a pre-determined number of clusters or a target group of physicians. Rather the data was used to identify the clusters and potential suspect groups (ie, unsupervised approach).
As a result of the clustering process, the indicators that did not contribute to the clustering were omitted. Then steps 3 and 4 were repeated until the outputs of the clustering were satisfactory. The complete list of the remaining indicators is in Table 3. Nine indicators of abuse and 5 indicators of fraud generated the optimal clustering results (one indicator was used both for fraud and abuse detection).
Table 3. The Indicators and the Statistics of Selecting the Clusters Indicating the Physicians Who Were Suspect of Perpetrating Abuse or Fraud .
Indicator | Mean | SD | Mean | SD |
Abuse detection |
Cluster 1 – 46% (75 physicians) |
Cluster 2 – 54% (89 physicians)a |
||
Percentage of the patients that they were visited more than once in a month | 16.20 | 5.46 | 21.72 | 11.20 |
The average of the prescript drug items in a claim | 3.54 | 0.57 | 4.66 | 0.57 |
The average cost of a drug prescription claimb | 23827 | 5398 | 29668 | 5662 |
The ratio of the 5 expensive antibiotic prescription to all physician claims | 0.25 | 0.11 | 0.39 | 0.17 |
The ratio of injection prescription to all physician claim | 0.81 | 0.30 | 1.53 | 0.42 |
The ratio of total injection prescription to all physician claim | 1.55 | 0.74 | 2.92 | 0.83 |
The ratio of total prescript antibiotic to all physician claims | 0.68 | 0.18 | 1.06 | 0.19 |
The ratio of injected antibiotic to physician claim | 0.19 | 0.11 | 0.44 | 0.15 |
The ratio of injected corticosteroid prescription to all physician claim | 0.26 | 0.13 | 0.48 | 0.19 |
Fraud detection |
Cluster 1 – 98% (160 physicians) |
Cluster 2 – 2% (4 physicians)a |
||
Percentage of reduplicative patients | 30.05 | 10.58 | 41.87 | 15.23 |
Percentage of reduplicative patients-pharmacy | 22.92 | 10.43 | 37.49 | 18.28 |
Percentage of reduplicative patients-pharmacy in a month | 5.70 | 3.19 | 10.09 | 4.88 |
The average cost of a drug prescription claimb | 26656 | 5902 | 40613 | 4756 |
The ratio of claims referred to a high-cost pharmacy | 0.08 | 0.24 | 4.16 | 1.16 |
a Cluster 2 indicates the suspect group.
b This indicator has been used in the detection of fraud and abuse.
Step 5: Discriminant Analysis
As a result of physician clustering on 11 months of data, we had 2 clusters and thus we labeled them as healthy or suspect (of fraud or abuse). We labeled the clusters based on the characteristics (ie, calculated indicators) of the each clusters. We then applied discriminant analysis on the remaining section of data to assess whether the identified sets of indicators result in similar categorizations of physicians into suspects and healthy. The discriminant analysis was applied on the data from the 12th month of the study year (about 8% of physician claims). The analysis was based on developing linear combinations of the predictor variables (indicators) that provide the best discrimination between the clusters.25
Results
Thirteen indicators were developed in total. Two indicators were related to cost issues, 4 indicators were related to the frequency and patterns of visits, and 7 indicators were related to prescription patterns. The analyses suggested that most indicators were either useful for fraud detection (4 indicators) or abuse detection (8 indicators). Only 1 indicator (‘the average cost of a drug prescription claim’) was shared in the detection of abuse and fraud (Table 3). In all of the indicators, the higher value of the indicator means the more possibility of abusive or fraudulent behavior.
Abuse Detection
Nine indicators were used in identifying a critical cluster of physicians that were suspect of perpetrating abuse (Table 3). The silhouette coefficients values were equivalent to 0.50 in the 2 clusters, suggesting fair clustering efficiency.16 The physicians in cluster 2 were more likely to perpetrate abusive behavior (suspect group). The analysis suggested that just over half of the general physicians (54%) in the province were suspect of conducting abusive behavior, featured mainly as prescription patterns that did not follow the agreed standards. Physicians in the suspect group were more likely to prescribe more medicines and injectables per prescription, and on average their prescription claims were about 25% more expensive.
We then applied the discriminant analysis on the remaining data (one month data). It showed that over 84.75% of the cases remained in the same healthy or suspect category that they had been assigned to (Table 4).
Table 4. Results of Applying Discriminant Analysis on One Month Data for Detecting Suspected Physicians Perpetrated Abuse .
Abuse Detection | ||||
Clusters | Predicted Group Membership | Total | ||
1 | 2 | |||
Original | 1 | 63 (84.00%) | 12 (16.00%) | 75 (100%) |
2 | 13 (14.60%) | 76 (85.39%) | 89 (100%) | |
Fraud Detection | ||||
Clusters | Predicted Group Membership | Total | ||
1 | 2 | |||
Original | 1 | 159 (99.37%) | 1 (0.62%) | 160 (100%) |
2 | 2 (50.00%) | 2 (50.00%) | 4 (100%) |
Fraud detection
We used 5 indicators for the detection of fraudulent behavior. The fraud behavior of interest in this study involved collusion between physicians and pharmacies. The indicators used for fraud detection included 4 indicators linked to physician behavior (mainly related to the visit patterns). We also defined a pharmacy related indicator. We classified pharmacies to 3 categories of low, medium, and high-cost pharmacies based on the average cost of general physicians prescription claims delivered by the pharmacy. It was likely that the physicians who were suspect of fraud would collude with a high-cost pharmacy.
The analysis categorized the physicians into 2 clusters. Physicians in cluster 2 (suspect group) were more likely to have patients ‘visiting’ them more than once in a short period of time, and have dispensed their prescriptions at a particular pharmacy. Suspect physicians were about 50 times more likely to have their prescriptions dispensed in a high-cost pharmacy and the average cost of their prescriptions was substantially (about 60%) higher compared with other physicians (Table 3).
The silhouette coefficient values reached a maximum of 0.70 in 2 clusters, suggesting a good clustering efficiency.16 Four physicians were identified as suspect of fraud; all of them had also been identified as suspects for abusive behavior in the abuse detection analyses.
We then applied discriminant analysis on the remaining one month data that showed 98.17% (161 out of 164) of original clustered cases were correctly classified (Table 4). All but one of the physicians that had been categorized in cluster 1, remained in the same cluster in the discriminant analysis. Two suspect physicians that had been identified as ‘suspect’ in clustering analysis moved to cluster 1 in the discriminant analysis.
Discussion
We clustered physicians using 13 extracted indicators and identified suspect groups of physicians for conducting fraud and abuse. The results of the discriminant analysis suggested that the indicators demonstrated adequate performance in the detection of physicians who were suspect of perpetrating fraud and abuse in a new sample of data.
The analyses identified a large proportion of the physicians as suspect of abusive behavior. This is in line with previous studies in Iran that demonstrated a high likelihood of ‘irrational’ prescribing (eg, the proportion of prescriptions that include injectable medicines).26 The ‘irrational’ behavior of referring patients for ambulatory injections (for example corticosteroids),27 can also be linked with physicians’ own gain via provision of an adjacent injection service. Hence the indicators were less discriminatory in identifying the physicians who were suspect of abusive behavior from those with a lower quality prescribing behavior. As such, it might be more efficient for insurance organizations to invest their efforts on provider behavior change interventions, such audit and feedback and effective continuous education interventions,27,28 rather than auditing claims for abuse. Such interventions would have the added benefits of reducing abusive behavior as well as potentially improving patient outcomes.
The results also identified a small proportion (just over 2%) of physicians as suspects of conducting the fraudulent behaviors of interest. This is similar to the findings of a study conducted in Esfahan province of Iran in 2012. They assessed a sample of physicians and used the auditor’s reports as an indicator of fraud. The study concluded that about 2.5% of general physicians had committed the fraudulent behavior of interest in our study (ie, "removing blank prescription pages from the patient’s logbook while the patient is unaware").29 The results of our study can be helpful to the insurance organizations in streamlining their auditing approaches towards the suspect groups.
The greater proportion of physicians as suspects of abusive behavior may indicate that the potential losses because of abuse might be much greater than what occurred because of fraud. Inefficient supervision of physicians and pharmacies in the private sector could result in prevailing abusive behavior and substandard care.
Using a data mining approach as routine practice by health insurance organizations in Iran may require a restructuring of the information systems.30 Some essential data for detecting fraud and abuse may not be adequately recorded in databases. For example, geographical address of physician office and pharmacies were not available in the database, so we could not use the physician office-pharmacy distance as an indicator in detecting fraud. Another problem in the data was linked with the prevalent nature of dual (public and private) practice by physicians in Iran. The routine databases did not objectively record this, and we conducted several checks in order to identify physicians with a significant presence in the public sector among our sample of private sector physicians. Dual practice is common among physicians in LMICs31 and other countries might face a similar limitation while applying data mining for fraud and abuse detection.
If data exists, the addition of patients’ age, gender and primary diagnosis of disease can improve the performance of data mining in targeting certain subgroups of patients. Studies in Turkey and Taiwan used disease-treatment combinations as an indicator for identifying disease related fraudulent behaviors.7,15 A proper linkage between the insured population registration database and the physician and pharmacy service databases could result in better claim handling and provide further avenues in using data mining for fraud and abuse detection in insurance organizations. Method of our study was similar to method that used by a study in Taiwan.16 They used 10 indicators to cluster physicians’ practice data. But the number of clusters (10 clusters) was determined by experts. They used a 2-stage unsupervised learning methods including self-organized map and principle compound analysis to compress the number of optimal clusters and identifying the critical clusters.16
In a country with more than one major insurer, the segmentation of the insurance markets might act as an obstacle to detect fraud and abuse. As a limitation of our study, we only focused on claims submitted to the SSO. Future studies should try analyzing the claims submitted to both major insurers in Iran simultaneously. A combined analysis of claims might be valuable for detecting certain abusive behaviors such as not spending enough time with each patient. A physician might try profiteering from allocating short visit times to each patient, which would also result in substandard prescribing approaches.
Our findings are dependent on the indicators selected and used for the analysis. It is obvious that if we had selected a different set of indicators (eg, for other behaviors of interest) the suspect groups of physicians might have been different.
As an advantage of our study, we selected "physician" as the unit of analysis, instead of claims. This was important as the data mining approaches are intended to be used for targeting auditing practices. Also analyzing physician claims results in a substantial reduction in the volume of data handled in each analysis. This will make data mining more manageable using personal computers, and hence more likely to be implemented in low-resource settings of provincial and district level offices of the insurance organizations.
In our study, we tried to conduct association rule mining to extract data-driven indicators based on methods that Shan et al19 proposed and we explained earlier in the paper. We wanted to create a number of breaking rules as indicators for clustering (labeling) general physicians. However, we did not identify suitable breaking rules for clustering physicians. This might have occurred because of a substantial variation in prescribing behavior of the general physicians. Association rule mining might be more useful for disease related data mining analyses (eg, for detecting fraud in the management of a chronic disease) and should be assessed in future studies.
Conclusion
The analyses identified a large proportion of the physicians as suspect of abusive behavior. The results also identified a small proportion (just over 2%) of physicians as suspects of conducting the fraudulent behaviors of interest. Our approach will help the insurance organizations to focus their limited resources for combating potential frauds on the limited number of physicians with suspicious behavior, and use the results of detailed assessments to decide about the required further actions.
Our data mining approach will help health insurance organizations in LMICs in streamlining auditing approaches towards the suspect groups rather than routine auditing of all physicians.
Acknowledgements
We thank the Social Security Organization, Tehran, Iran for the provision of the data, the Deputy Chancellor for Research of the Tehran University of Medical Sciences, Tehran, Iran for funding the study, and Dr. Ansaripour at the SSO for his invaluable assistance.
Ethical issues
The study was approved by the Tehran University of Medical Sciences’ Research Ethics Committee, Tehran, Iran. We sought, and were granted, formal approval of the SSO for the conduct of the study. Based on this approval we were granted access to their relevant data sources as the physicians’ claims data are not publicly available. We gave full attention to potential ethical issues in the conduct of the study, especially that all the physicians’ personal data were anonymized before the analyses and the analysis reports were obtained only from anonymized data files.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Conception and design: HJ, AR, and BG; Acquisition of data: HJ, AR, MN, and BG; Analysis and interpretation of data: HJ, AR, BM, MM, BG, MN, and MA; Drafting of the manuscript: HJ, AR; Critical revision of the manuscript for important intellectual content: HJ, AR, BG, and MM; Statistical analysis: HJ, AR, MM, BM, MN, and BG; Obtaining funding: AR, MA, and HJ; Administrative, technical, or material support: AR, HJ, and MA; Supervision: AR, HJ, and MA.
Authors’ affiliations
1Health Economics Group, Social Security Organization, Tehran, Iran. 2Department of Health Management and Economics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran. 3School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran. 4Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran. 5Department of Education Management, School of Psychology and Education, University of Tehran, Tehran, Iran.
Key messages
Implications for policy makers
A large proportion of the physicians are suspect of perpetrating abusive behavior and a small proportion (just over 2%) of physicians is suspect of conducting the fraudulent behaviors.
Third-party payers in low- and middle-income countries (LMICs) can improve auditing approaches towards the suspect groups rather than routine auditing of all physicians.
Implications for public
Up to 10% of total health expenditure is wasted because of fraud and abuse, amounting to billions of dollars per year. Traditional methods of detecting healthcare fraud and abuse are based on auditing procedures that are often time-consuming and practically ineffective. Our research can help third-party payers such as health insurance organizations to extract useful knowledge from thousands of claims and identify a smaller subset of the claims or claimants for further assessment and scrutiny for fraud and abuse.
Citation: Joudaki H, Rashidian A, Minaei-Bidgoli B, et al. Improving fraud and abuse detection in general physician claims: a data mining study. Int J Health Policy Manag. 2016;5(3):165–172. doi:10.15171/ijhpm.2015.196
References
- 1. Health Financing. World Health Organization website. http://www.who.int/gho/health_financing/en/index.html. Accessed October 13, 2013.
- 2. Gee J, Button M, Brooks G, Vincke P. The financial cost of healthcare fraud. Portsmouth: University of Portsmouth, MacIntyre Hudson, Milton Keynes. http://eprints.port.ac.uk/3987/1/The-Financial-Cost-of-Healthcare-Fraud-Final-(2).pdf. Accessed December 20, 2010.
- 3. What is Fraud and Abuse? Department of Finance and Administration website. http://www.tn.gov/tnoig/WhatIsFraudAbuse.shtml. Accessed October 13, 2013.
- 4. Torras H. Health care fraud and abuse: a physician’s guide to compliance. American Medical Association Press; 2006.
- 5.Rashidian A, Joudaki H, Vian T. No evidence of the effect of the interventions to combat health care fraud and abuse: a systematic review of literature. PLoS One. 2012;7(8):e41988. doi: 10.1371/journal.pone.0041988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Li J, Huang KY, Jin J, Shi J. A survey on statistical methods for health care fraud detection. Health Care Manage Sci. 2008;11:275–287. doi: 10.1007/s10729-007-9045-4. [DOI] [PubMed] [Google Scholar]
- 7.Aral KD, Güvenir HA, Sabuncuoğlu İ, Akar AR. A prescription fraud detection model. Comput Methods Programs Biomed. 2012;106:37–46. doi: 10.1016/j.cmpb.2011.09.003. [DOI] [PubMed] [Google Scholar]
- 8. Ortega PA, Figueroa CJ, Ruz GA. A medical claim fraud/abuse detection system based on data mining: a case study in chile. Paper presented at: International Conference on Data Mining; 2006, Las Vegas, Nevada. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.176.796&rep=rep1&type=pdf. Accessed October 13, 2013.
- 9.Chaix-Couturier C, Durand-Zaleski I, Jolly D, Durieux P. Effects of financial incentives on medical practice: results from a systematic review of the literature and methodological issues. Int J Qual Health Care. 2000;12(2):133–142. doi: 10.1093/intqhc/12.2.133. [DOI] [PubMed] [Google Scholar]
- 10.Kalb PE. Health care fraud and abuse. JAMA. 1999;282(12):1163–1168. doi: 10.1001/jama.282.12.1163. [DOI] [PubMed] [Google Scholar]
- 11. Maimon OZ, Rokach L, eds. Data mining and knowledge discovery handbook. New York: Springer; 2005. doi:10.1007/978-0-387-09823-4.
- 12.Joudaki H, Rashidian A, Minaei-Bidgoli B. et al. Using data mining to detect health care fraud and abuse: a review of literature. Glob J Health Sci. 2014;7(1):194–202. doi: 10.5539/gjhs.v7n1p194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bolton RJ, Hand DJ. Statistical fraud detection: a review. Stat Sci. 2002;17(3):235–249. doi: 10.1214/ss/1042727940. [DOI] [Google Scholar]
- 14.Liou FM, Tang YC, Chen JY. Detecting hospital fraud and claim abuse through diabetic outpatient services. Health Care Manage Sci. 2008;1:353–358. doi: 10.1007/s10729-008-9054-y. [DOI] [PubMed] [Google Scholar]
- 15.Yang WS, Hwang SY. A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl. 2006;31:56–68. doi: 10.1016/j.eswa.2005.09.003. [DOI] [Google Scholar]
- 16.Lin C, Lin CM, Li ST, Kuo SC. Intelligent physician segmentation and management based on KDD approach. Expert Syst Appl. 2008;34:1963–1973. doi: 10.1016/j.eswa.2007.02.038. [DOI] [Google Scholar]
- 17.Shin H, Park H, Lee J, Jhee WC. A scoring model to detect abusive billing patterns in health insurance claims. Expert Syst Appl. 2012;39:7441–7450. doi: 10.1016/j.eswa.2012.01.105. [DOI] [Google Scholar]
- 18. Shan Y, Murray DW, Sutinen A. Discovering inappropriate billings with local density based outlier detection method. Paper presented at: The Eighth Australasian Data Mining Conference; 2009; Melbourne, Australia. http://crpit.com/confpapers/CRPITV101Shan.pdf. Accessed October 13, 2013.
- 19. Shan Y, Jeacocke D, Murray DW, Sutinen A. Mining medical specialist billing patterns for health service management. Paper presented at: The 7th Conferences in Research and Practice in Information Technology; 2008; Australia. http://dl.acm.org/citation.cfm?id=2449306. Accessed October 13, 2013.
- 20.Sparrow MK. Health care fraud control understanding the challenge. J Insur Med. 1996;28(2):86–96. [PubMed] [Google Scholar]
- 21.Williams G, Huang Z. Mining the knowledge mine: The Hot Spots methodology for mining large real world databases. Lect Notes Comput Sci. 1997;1342:340–348. [Google Scholar]
- 22.Rashidian A, Joudaki H. Assessing medical misconduct and complaints in Iran health system: a systematic review of literature. Sci J Forensic Med. 2010;15(4):234–243. [In Persian]. [Google Scholar]
- 23. Rashidian A, Khosravi A, Khabiri R, et al. Islamic Republic of Iran’s Multiple Indicator Demographic and Health Survey (IrMIDHS) 2010. Tehran: Ministry of Health and Medical Education, 2012. [In Persian].
- 24. Social Security Organization (SSO). Annual Report for 2012. The Bureau of Statistics and Socio-economic Measurement, Social Security Organization; 2013. [In Persian].
- 25.Tatsuoka MM. Multivariate analysis in educational research. Rev Res Educ. 1973;1:273–319. [Google Scholar]
- 26.Soleymani F, Valadkhani M, Dinarvand R. Challenges and achievements of promoting rational use of drug in Iran. Iran J Public Health. 2009;38(suppl 1):166–168. [Google Scholar]
- 27.Esmaily HM, Silver I, Shiva S. et al. Can rational prescribing be improved by an outcome-based educational approach? A randomized trial completed in Iran. J Contin Educ Health Prof. 2010;30(1):11–18. doi: 10.1002/chp.20051. [DOI] [PubMed] [Google Scholar]
- 28.Garjani A, Salimnejad M, Shamsmohamadi M. et al. Effect of interactive group discussion among physicians to promote rational prescribing. East Mediterr Health J. 2009;15(2):408–415. [PubMed] [Google Scholar]
- 29.Ghodoosi A, Abedi HA, Mansouri A, Riaziat A. Evaluating the cases of violating the regulations of medical services insurance organization in Isfahan province, Iran. Health Inf Manag. 2012;9(3):339–347. [In Persian]. [Google Scholar]
- 30.Rainsford C, Roddick J. Database issues in knowledge discovery and data mining. Australasian J Info Syst. 1999;6(2):101–128. [Google Scholar]
- 31.Ferrinho P, Van Lerberghe W, Fronteira I, Hipólito F, Biscaia A. Dual practice in the health sector: review of the evidence. Hum Resour Health. 2004;2(1):14. doi: 10.1186/1478-4491-2-1. [DOI] [PMC free article] [PubMed] [Google Scholar]