Table 1.
Primary studies that used data mining for detecting health care fraud and abuse
| Study Topic (Country) | The first author(year) | Data mining approach | Type of detected fraud | Applied data mining technique (s) |
|---|---|---|---|---|
| Healthcare fraud detection: A survey and a clustering model incorporating Geo-location information (US) | Liu (2013) | Unsupervised | Insurance subscribers’ fraud | Clustering |
| Application of Bayesian Methods in Detection of Healthcare Fraud (-) | Ekina (2013) | Unsupervised | Conspiracy fraud which involves more than one party | Bayesian co-clustering |
| Unsupervised labeling of data for supervised learning and its application to medical claims prediction (US) | Ngufor (2013) | Hybrid supervised and unsupervised | Provider fraud (Obstetrics claims) | Unsupervised data labeling and outlier detection, classification and regression |
| Outlier based predictors for health insurance fraud detection within U.S. Medicaid (US) | Capelleveen (2013) | Unsupervised | Provider fraud (Dental claim data) | Outlier detection |
| A scoring model to detect abusive billing patterns in health insurance claims (Korea) | Shin (2012) | Supervised | Provider fraud (Outpatient clinics) | Six statistical techniques — correlation analysis, logistic regression and classification tree |
| A fraud detection approach with data mining in health insurance (Turkey) | Kirlidog (2012) | Supervised | Provider fraud | Support vector machine (SVM) |
| Applying Business Intelligence Concepts to Medicaid Claim Fraud Detection (US) | Copeland, (2012) | Unsupervised | Provider fraud | Visualization by histogram |
| A prescription fraud detection model (Turkey) | Aral (2012) | Hybrid supervised and unsupervised | Prescription fraud | Distance based correlation and risked matrices |
| Unsupervised fraud detection in Medicare Australia (Australia) | Tang (2011) | Unsupervised | Insurance subscribers’ fraud | Clustering, feature selection and outlier detection |
| Two models to investigate Medicare fraud within unsupervised databases (US) | Musal (2010) | Unsupervised | Provider fraud | Clustering algorithms, regression analysis, and various descriptive statistics |
| Data mining to predict and prevent errors in health insurance claims processing (US) | Kumar (2010) | Supervised | Error in providers claims | Support vector machine (SVM) |
| Discovering inappropriate billings with local density based outlier detection method (Australia) | Shan (2009) | Unsupervised | Provider fraud (Optometrists Billing) | Local density based outlier detection |
| Mining medical specialist billing patterns for health service management (Australia) | Shan (2008) | Unsupervised | Provider fraud (Specialist billing) | Association rules |
| Detecting hospital fraud and claim abuse through diabetic outpatient services (Taiwan) | Liou (2008) | Supervised | Provider fraud (Diabetic outpatient services) | Logistic regression, neural network, and classification trees |
| A process-mining framework for the detection of healthcare fraud and abuse (Taiwan) | Yang (2006) | Supervised | Provider fraud (Gynecology services) | Classification based on associations algorithm, feature selection by Markov blanket filter |
| A medical claim fraud/abuse detection system based on data mining: a case study in Chile (Chile) | Ortega (2006) | Supervised | Provider fraud | Neural network |
| EFD: A Hybrid Knowledge/Statistical-Based System for the Detection of Fraud (US) | Major (2002) | Hybrid supervised and unsupervised | Provider fraud | Outlier detection and rule extraction |
| Application of Genetic Algorithms and k-Nearest Neighbour method in real world medical fraud detection problem (Australia) | He (1999) | Unsupervised | Provider fraud (General practitioners) | Genetic algorithm and K-Nearest Neighbor clustering |
| Evolutionary Hot Spots data mining: architecture for exploring for interesting Discoveries (Australia). | Williams (1999) | Hybrid supervised and unsupervised | Insurance subscribers’ fraud | Clustering and rule induction |
| Mining the knowledge mine: The Hot Spots methodology for mining large real world databases (Australia) | William (1997) | Hybrid supervised and unsupervised | Insurance subscribers’ fraud | Clustering and C5.0 classification algorithm |
| Application of neural networks to detection of medical fraud (Australia) | He (1997) | Supervised | Provider fraud (General practitioners) | Neural network |