ABSTRACT
Health care prescription fraud and abuse result in major financial losses and adverse health effects. The growing budget deficits of health insurance programs and recent opioid drug abuse crisis in the United States have accelerated the use of analytical methods. Unsupervised methods such as clustering and anomaly detection could help the health care auditors to evaluate the billing patterns when embedded into rule-based frameworks. These decision models can aid policymakers in detecting potential suspicious activities. This manuscript proposes an unsupervised temporal learning-based decision frontier model using the real world Medicare Part D prescription data collected over 5 years. First, temporal probabilistic hidden groups of drugs are retrieved using a structural topic model with covariates. Next, we construct combined concentration curves and Gini measures considering the weighted impact of temporal observations for prescription patterns, in addition to the Gini values for the cost. The novel decision frontier utilizes this output and enables health care practitioners to assess the trade-offs among different criteria and to identify audit leads.
KEYWORDS: Multivariate anomaly detection, decision models, Medicare Part D, topic model, prescription patterns, health care fraud
Highlights
This work presents an integrated statistical learning and decision framework based on multicriteria concentration curves for drug prescription anomaly decision making over time.
The model first utilizes a natural language processing algorithm (i.e. structural topic model) to detect prescription patterns. It then introduces different outlier detection approaches to detect anomalies for overall prescription patterns or certain drug groups such as opioids. The final results of this unsupervised model will be embedded into a visual tool along with decision frontiers determined based on different risk thresholds.
The visual tool enables health care practitioners or auditors to assess the trade-offs among different criteria and identify audit leads to detect aberrant prescription behaviors.
The proposed framework is modular and can be integrated in other decision support tools to help detect anomalies and generate investigation leads. It is also general enough to study prescription data over different time frames, such as monthly billings.
1. Introduction
Health care prescriptions are the instructions written by medical practitioners that authorize patients to be provided medicine. The aging populations and more widespread access to care have increased the number of prescriptions, especially in the developed world. In the United States (U.S.), Centers for Medicare and Medicaid Services (CMS) estimates that prescription drug expenditure was 335 billion U.S. dollars in 2018 [12]. Medicare Part D is the largest prescription program serving more than 43 million beneficiaries with more than 120 billion dollars annual spending [35]. Sixty-six percent of all adults in the U.S. are estimated to use prescription drugs, while the number of prescriptions was around 4.21 billion in 2018 corresponding to approximately $1200 spending per person. The percent of people using at least one prescription drug in the past 30 days in U.S. was reported as 48.4% in 2016 [10].
It is estimated that three to 10% of overall health care spending is lost to overpayments in the form of fraud, waste, and abuse [14]. Prescription fraud refers to the execution of illegitimate prescriptions, which mostly take place in the form of drug diversion. For instance, opioid abuse is a multi-billion dollar problem that corresponds to 64% of prescription fraud .1 The actions to address the growing concerns have gained momentum in the last few years. 2017 National Health Care Fraud Takedown charged 412 defendants for their alleged participation in health care fraud schemes involving approximately $1.3 billion in false billings [36]. Most of these prescription fraud investigations have been initiated based on beneficiary tips and complaints [35] which limit their extent. The cost of the health care audits and the size and complexity of the programs require the use of analytical methods for finding leads in prescription audits. As a result, statistical health care fraud assessment has been an emerging area in scholar literature with the increasing availability of public data sources; see Li et al. [28] and Ekin et al. [18] for comprehensive overviews, and emerging applications [4,15,16,19].
This paper focuses on prescription audits with an emphasis on the use of unsupervised methods to detect potential audit leads. Unsupervised methods have become popular to detect prescription fraud due to their ease of use and communication in addition to lack of labeled data. Even simple descriptive statistical analysis coupled with visualization can be beneficial to understand prescription patterns. Nordmann et al. [34] studied geographical variations in opioid use with a focus on doctor shopping as a means for drug diversion. Kaye et al. [23] reviewed the opioid abuse predictors and presented curbing strategies. Brownstein et al. [7] used geographical volume data to construct risk maps of prescription opioid abuse and identified clusters of abuse. Similarly, Liu et al. [29] developed a graph analysis technique analyzing relation and referrals among pharmacies, doctors, and patients. They focused on narcotics prescriptions and detected anomalies in latent communities. However, these proposed methods may not be able to handle heterogeneous and multi-dimensional data. Multivariate anomaly detection and clustering methods could address those concerns in the healthcare fraud assessment [8]. For prescription fraud, Aral et al. [1] proposed a distance-based unsupervised algorithm to assess the fraudulent risk of prescriptions. Iyengar et al. [21] identified anomalies for each prescription area with a rule-based approach. Johnson and Nagarur [22] combined the average number and length of visits, patient demographics, and claim amounts into a risk function, which is embedded into a decision tree and an expected utility model. Kose et al. [25] proposed an interactive machine learning approach that offers a proactive real-time analysis by comparing the risk levels of claims with abnormal behavior types. The risk levels are computed using aggregate weighted z scores of multiple attributes. Haddad Soleymani et al. [20] proposed an unsupervised method that computes individual frequency matrices based on medicine code, sex, and age, and compares the resulting risk scores with static thresholds in order to flag outlier prescriptions. The multivariate anomaly detection method of Zafari and Ekin [41] is shown to create benchmark groups and identify aberrant billing patterns. These models are applied at a particular time and the evolution of learning and temporal changes are not considered. Such static rules are vulnerable to changing patterns over time, and can be overcome by adaptive fraudsters. Also, they mostly lack the decision aspect. In order to analyze changes over time, industry practitioners generally conduct trend analysis with a linear regression, which is later embedded into a rule-based decision tool. Kose et al. [25] is an exception as it provides a dynamic outlier detection supported decision system which visualizes the fraudulent classes over time. However, to the best of our knowledge, there are not any methods that combine temporal analytical analysis, anomaly detection methods, and decision models.
This paper fills that gap by addressing these challenges with an integrated statistical learning and decision framework that incorporates the output from a temporal unsupervised method and multicriteria anomaly detection analysis to a decision model. We use structural topic models [37] to analyze prescription patterns by accounting for time in modeling the frequency of occurrence of drugs in the drug groups, ‘topical content’. This enables us to consider the prescription frequency of each drug group for each medical specialty at a given time; and provides temporal probabilistic groups of drugs. The temporal associations among providers and prescribed drugs can help understand hidden patterns which would otherwise be missed, and enable auditors discover potential cases of drug diversion over time. The output is evaluated within a multicriteria anomaly detection framework that uses concentration function. The anomalous patterns are retrieved with respect to multiple criteria such as prescription frequency and cost. This paper is also the first application work that extends the univariate concentration functions of Ekin et al. [17] to incorporate multiple concentration curves over time. The proposed unsupervised framework can enable identification of leads for audits of providers prescribing excessive or medically unnecessary drugs across multiple years. Lastly, we embed this multivariate anomaly detection output into a decision framework, which considers multiple criteria of prescription frequency, provider type, time and cost. This flexible graphical decision frontier tool is based on weighted combinations of individual Gini measures or percentiles. We discuss the practical implications of the method where the exceedance above auditor specified thresholds provides warnings for further exploration.
The rest of this paper is structured as follows. Section 2 describes the proposed methodology: temporal probabilistic cluster construction by structural topic modeling, detection of anomalous prescriber groups using multivariate concentration curves, and the decision frontiers that are used to assess the trade-offs. Section 3 presents the analysis with a discussion of data pre-processing, model initialization, results of the temporal prescription patterns, anomaly detection with respect to frequency, cost, and opioid abuse. These are incorporated into a decision tool that graphically reveals the trade-offs. The paper concludes with a discussion of challenges and future work.
2. Methodology
This paper proposes a multicriteria decision frontier that utilizes the unsupervised hidden output of temporal structural topic models and multivariate anomaly detection methods. The proposed methodological framework can be summarized into three parts: formation of temporal probabilistic drug groups using the structural topic models, detection of anomalous prescriber groups with respect to hidden billing patterns and costs of drug groups, and constructing decision frontiers that help assess the trade-offs involved with multiple criteria. Figure 1 shows the overall process and in the following sections, we describe each of these in detail.
Figure 1.
The algorithmic flow chart for the proposed anomaly detection based decision process.
2.1. Temporal probabilistic drug groups
We construct temporal probabilistic clusters (i.e. drug groups) by using the structural topic models with covariates. While a similar approach is taken in [41], the current model extends the previous work by accounting for time and the evolution of drug groups with respect to both topical prevalence and topical content. By doing so, we model temporal changes in both prescription patterns of providers from different drug groups as well as the drug frequencies within drug groups.
Structural topic models are generative statistical latent variable models. In particular, we use a generative model for D prescribers over K drug groups (clusters/topics) for T time periods over all the drugs from a drug list indexed by . The main dataset (corpus) is created by generating a separate document for each prescriber at time t, where the name of each of the prescriber's billed drug v is repeated the number of times of prescription. We use time, medical specialty, and average beneficiary risk score as features for each prescriber. These covariates are denoted as for each prescriber and time . For a given prescriber at a given time period, the average beneficiary risk score reflects the aggregate health level of patients based on age, sex, prior medical diagnoses, and other criteria following the Hierarchical Condition Category (HCC) method [13]. Our model assumes that providers with the same specialty in the same year with a similar type of patients tend to have similar drug prescription patterns.
The topical prevalence is defined as the mixture distribution over the K drug groups for each prescriber d at time t. The topical content parameter is a V-dimensional vector, where each component, corresponds to the probability of the drug v being part of the drug group k at time t. In its estimation, deviations from the background distributions of all prescriptions (i.e. corpus) are specified as functions of drug groups, of observed covariate (levels of time), and of drug group-covariate interactions. For those that are not familiar with structural topic models, the details of the model are provided in Appendix A.1 to keep this manuscript concise.
2.2. Anomaly detection measures
We use the probabilistic membership of drugs to drug groups, ; and probabilistic membership of prescribers to drug groups, , as inputs to anomaly detection methods to retrieve suspicious hidden prescription patterns over time. This section presents a discussion of anomaly detection and concentration curves, then introduces the proposed methods to detect anomalous prescriber groups over time.
Anomaly methods are widely used to detect suspicious billing activities by finding observations that do not comply with the benchmark grouping of the data. A standard method is to use descriptive statistical methods and analyze the top claims as much as the budget allows. Capelleveen [8] provides an overview of outlier detection methods based on linear models, boxplots, peak analysis, multivariate clustering and expert evaluation. Local density-based outlier detection [38] and Benford's law distributions [30] are also among the methods used in statistical health care fraud assessment. The Bayesian model of Bauder and Khoshgoftaar [5] uses credibility intervals to assess outliers. This paper, however, uses concentration curves due to their power in data aggregation and visualization.
The Lorenz curve [32] has become popular through its use in wealth (in)equality analysis by comparing the actual income distribution with a uniformly distributed one. This graphical tool plots the cumulative income ordered by the income sizes while the benchmark uniform income corresponds to the straight line. Concentration function is a generalized Lorenz curve that allows comparing any pair of probability distributions on the same measurable space [11]. In order to quantify the overall discrepancies between the distributions, Gini coefficient measures the area between the curve and the straight line. The larger the discrepancy of a selected distribution to the benchmark distribution, the farther away the curve will be from the straight line which leads to larger values of Gini; maximum is 1.
The health care applications of the Lorenz curve include, but are not limited to, the analysis of resource consumption [31], response time in emergency medical services [27], and patient choice [40]. The use of the concentration function is introduced to health care fraud assessment by Ekin et al. [17] as a pre-screening tool that analyzes health care providers' billing patterns. Zafari and Ekin [41] utilize it with the clustering output to identify prescription discrepancies. However, such existing work focuses on univariate dimensions at a given time. In general, use of unsupervised outlier detection with high-dimensional numerical data brings challenges, see Zimek et al. [42] for an overview. In particular, the extensions of Lorenz curves to higher dimensions are mostly not adopted, as either they are not applicable to more than two dimensions [39] or because their computation is challenging [26]. Recently, Arnold and Sarabia [2] propose a method based on mixtures of Lorenz surfaces. However, to the best of our knowledge, there are no applications of multivariate Lorenz/concentration curves in health care literature.
Let us assume prescriber/doctor d that has medical specialty m, where and : in our notations, we refer to as the kth drug group proportion for prescriber d recorded at time t, and as the average of the kth drug group proportion for all the prescribers with specialty m at time t. Considering all the drug groups, we now have two K-size vectors and . The discrepancy between these two vectors is visualized by the concentration functions, and summarized by Gini's area of concentration score. However, the vectors are measured for a given time and still need to be aggregated over time. To combine concentration functions of multiple time periods, we use the weighted average based approach of Arnold and Sarabia [2]. In particular, we aggregate the differences of the drug group distribution of the prescriber d, and the drug group distribution of all prescribers with medical specialty m over time; respectively, and . The underlying assumption of the proposed method is that time-aggregated prescription patterns of prescribers with the same specialty and similar beneficiaries profiles would be similar.
In the following, we present the computation of Gini scores with respect to prescription patterns and cost and over-prescription of specific drug types as important factors to determine investigation leads in audits. One can simply extend these for other criteria or specific drug-type measures.
2.2.1. Measures with respect to prescription patterns
For a given prescriber d at time t, drug groups are ordered in an ascending order based on their likelihood ratios of . The concentration function is drawn by connecting these respective cumulative drug group distribution points , , where , , and for the ordered drug groups . The increasing curve connecting these points is called the concentration function of probability measure with regards to at time t. The respective Gini's area of concentration score at time is computed as
| (1) |
The time-aggregated (i.e. weighted average) of the respective Gini scores is recorded as
| (2) |
where .
In this paper, we aggregate the Gini scores for different years using equal weights. Alternatively, one can use different weights with diminishing rates for the past to increase the impact of recent years.
2.2.2. Measures with respect to prescription costs
In addition to the prescription patterns, auditors also often investigate prescription costs to determine the claims for further drill-downs. Hence, we also compute time-aggregated Gini scores for the cost aspect.
Let be a V-dimensional row vector capturing drug cost distribution of doctor d at time t, where V is the total number of distinct drugs prescribed by all the providers over the study horizon. We let denote the relative cost distribution of prescriber d for k drug group at time t. is a K-dimensional vector consisting of over the K drug groups at time t, and the mean cost distribution for the prescribers with medical specialty m.
For a given prescriber d at time t, the drug groups are ordered with respect to their cost likelihood ratio denoted as . Concentration curves and subsequently Gini scores are calculated similar to the steps described in Section 2.2.1. Finally, the time-aggregated Gini cost score for each doctor d with specialty m is computed as
| (3) |
The weights, , with can be determined with respect to the importance of each year by the auditors. We assume them to be equal to the weights in the previous subsection where .
2.2.3. Measures with respect to drug types
By utilizing Gini summary measures, one would flag anomalous prescribers with both very high and/or very low prescription frequencies or costs. However, auditors are often mainly interested in prescribers with high prescription frequencies. These excessive prescriptions can be captured by a one-sided measure that considers drug types such as opioid, antibiotic, etc. Considering drug type over prescription patterns, auditors can further narrow down their investigation leads. To demonstrate how other measures can be included in framework, we estimate a time-aggregated extension of the opioid score of Zafari and Ekin [41]. For prescriber d with specialty m, the time-aggregated opioid score is defined as
| (4) |
where and is a binary indicator to show whether drug v is opioid or not. While we only estimate this one-sided anomaly detection measure for opioid drugs, the proposed score can be easily generalized to capture high deviations for other drug types as well.
2.3. Decision frontiers
The final challenge is to incorporate the anomaly detection output into a decision-making framework for further investigations. In order to make audit decisions based on unsupervised model outputs, we can consider numerical and/or graphical decision tools. Our focus is on graphical decision frontiers for multicriteria comparison. In particular, these tools allow trade-off analysis by combining the output of aforementioned methods. The decision frontier curves and lines are drawn to define similar segments and to help with the visual analysis. They reflect the prescription activities of a given prescriber compared to the peers. They are adaptive to changes in data; and can be used as a dynamic tool. We utilize two approaches to construct decision frontiers, the case where the auditor can specify one common joint threshold for all criteria or an individual threshold for each criterion.
The joint threshold approach is based on plotting the Gini measures of each prescriber in the bi-dimensional space and then drawing the curves corresponding to , where the 's are thresholds calculated based on percentile (e.g. 75th, 90th, 95th, 99th) of the sum of squares of the two metrics. Here, we are interested in the distance of the point from the origin and all the points above the curves represent observations beyond the specified threshold and are candidates for further investigation, with urgency increasing as the risk level of the curve increases. The thresholds can be changed interactively, e.g. when willing to have a predetermined proportion, say flagging the top versus of the prescribers. Other than the curved line presented here, one could also consider a linear combination of the metrics. In addition, different weights, e.g. , ; could be applied to emphasize a criterion.
The second approach is based on using percentiles of each criterion as individual thresholds rather than joint thresholds. Prescribers that have a Gini measure above such percentile can be flagged for that particular criterion. In the cases of two criteria, percentiles determine rectangles in the right top corner of the unit square (remember that the Gini measure does not exceed 1). All the points falling in such rectangles are audit candidates for falling into the predetermined percentile group. In this case, percentiles can also be adjusted interactively to allow for more or less points in the rectangles. It is also possible to consider different percentiles for different criteria.
In addition to graphical decision tools, numerical approaches could be used especially for a large number of criteria. Suppose we retrieve Gini measures for a doctor for N criteria: (we omit the dependence on the prescriber). We can combine these measures to calculate a risk score using a weighted index, given by
where the weights denote the relative importance, assigned by the auditors, of each criterion and are such that . Observe that . A simple numerical tool is represented by a common threshold R<1, set as well by the auditors, suggesting further investigations for any prescriber whose weighted index exceeds R. It is possible to determine a set of thresholds denoting an increasing level of risk averseness.
3. Analysis
This section presents the application of the proposed framework that aids medical auditors in identifying providers with aberrant prescription behaviors. While we illustrate the results using the data from the state of New Hampshire, the proposed methods could also be used for other states.
3.1. Data pre-processing, variables, and model initialization
For this study, we use the Medicare Part D prescriber data of the state of New Hampshire over 5 years, from 2013 to 2017 [13]. The small state of New Hampshire has one of the highest drug overdose rates [9] with an alarming trend. Particularly, opioid overdose death rates, which account for most of these deaths, have been high [24].
As part of data pre-processing, we have further filtered the data to the top 20 medical specialties in terms of opioid claims per beneficiaries, as more than half of the specialties never billed for opioids or had a very low number of relevant claims. While the inclusion of medical specialties with low opioid utilization rates does not have a major effect on the overall prescription patterns, it could potentially increase the number of false positives due to higher levels of heterogeneity. Considering all 5 years, the data has 1617 providers that submitted more than eleven million Medicare Part D claims for 981 distinct drugs. The number of distinct drugs that were prescribed in a given year by each doctor ranges from 1 to 243 with a median of 20. In contrast, the total number of claims that were submitted by each doctor (i.e. document sizes) in a given year ranges from 11 to 23,270 with a median of 598. Some of the prescription summaries for the prescribers during the study horizon are shown in Table 1. We can see some consistent patterns and some upward trends in terms of number of submitted claims and the total cost of the claims; which could be partially due to the increase in Medicare Part D enrollments in this state from 136,010 in 2013 to 184,853 in 2017 [13]. Average beneficiary risk scores are observed to be relatively consistent with a slight upward trend. While the changes in these average values can indicate the overall trends, individual discrepancies can only be detected by explicitly modeling temporal changes and forming appropriate peer comparison benchmarks.
Table 1.
Prescription summaries from 2013 to 2017.
| Year | Total number of claims | Total cost of drugs | Average number of distinct drugs prescribed | Average beneficiary risk score |
|---|---|---|---|---|
| 2013 | 1,925,670 | $140,474,577 | 30.124 | 1.241 |
| 2014 | 2,130,846 | $176,187,354 | 32.344 | 1.208 |
| 2015 | 2,322,213 | $212,992,768 | 33.964 | 1.275 |
| 2016 | 2,402,135 | $225,107,205 | 33.993 | 1.279 |
| 2017 | 2,388,951 | $237,289,336 | 33.312 | 1.272 |
The proposed framework aims to capture the overall evolution over time while controlling for all relevant variables. In setting up the structural topic model, we consider different set of covariates for topical prevalence and topical content. Topical prevalence covariates capture the contribution of different drug groups (i.e. topics) to doctors' prescriptions (i.e. documents). We consider medical specialty, prescription year, and average beneficiaries' risk score as the set of prevalence covariates that explains the variation in doctors' prescription patterns of different drug groups. Our initial computational results suggested that the average beneficiaries' risk score is a good proxy in capturing differences in patients' demographics and can explain differences in prescription patterns across regions. Therefore, the spatial covariates of zip-code and county are not included in the model for parsimony. As for the topical content covariate, we consider prescription year. The use of prescription year as a covariate makes it possible to model the evolution of each hidden drug group and prescribers' billing of those groups over time.
We initialized the model using the collapsed Gibbs sampler implementation of latent Dirichlet allocation [6] since it is shown to be superior for smaller corpus sizes (less than 40,000 documents) [3]. In order to choose the number of drug groups (i.e. topics), we tested models for 5–60 drug groups with increments of 5. We ultimately chose the model with 30 drug groups as it has a reasonable trade-off between drug group quality (i.e. semantic coherence [33] which measures the co-occurrence of the most frequent drugs of a given drug group) and predictive performance (i.e. out-of-sample likelihood measured as the probability of held-out drugs belonging to a doctor's prescription).
3.2. Drug groups and trends
The output of the topic model includes drug groups ( ) and the prescription proportion of each prescriber from these drug groups ( ). Each drug group is essentially a distribution over the set of all drugs with varying probabilities at a given time. In order to demonstrate the temporal aspects of the proposed framework, we focus on how the model captures changes in drug groups and prescribers' billing patterns over time.
CMS provides information on drug types using four common categories: opioid, long-acting opioid, antipsychotic, and antibiotic. Using this information, for instance, we can measure the overall expected frequency of opioids for each drug group k for a certain year t as , where is a binary indicator to show whether drug v is opioid or not. A similar formula with different indicator vectors can be used for other drug types. The left panel of Figure 2 displays the expected opioid frequency of drug groups averaged over 5 years. For instance, we see that drug group 22 mostly represents opioid drugs. In other words, within this drug group, the top drugs (i.e. drugs with highest posterior frequencies) are opioid type drugs. However, it should be noted that these frequencies and the nature of drug groups could change over time. The right panel of Figure 2 shows the evolution trajectory of the most frequently billed drugs within drug group 22. The same analysis could be applied to other drug groups and/or for different drug types.
Figure 2.
Expected opioid frequencies for each drug group averaged over the years (left) – The posterior frequency estimate of several of the top drugs within drug group #22 (right). Note that the plot on the right is scaled to demonstrate the trend of the drugs' posterior probability.
We can extend this analysis to detect temporal changes of billing patterns for medical specialties. The left panel of Figure 3 displays the variability of expected billing behavior from drug group 22 across years for a number of medical specialties. Some specialties are recognized to commonly bill for this drug group and some rarely do. For instance, as it can be expected, anesthesiologists have higher billing proportions for drug group 22 with a relatively smaller variability, whereas interventional pain management doctors and neurosurgeons have higher variabilities. The median expected billing proportion for neurosurgeons is lower than their mean, indicating the existence of some providers billing relatively high amounts compared to their peers. In addition, we can analyze the trends as shown in the right panel of Figure 3. For example, one can observe the upward trend of the expected billing of neurosurgeons for this drug group from 2013 to 2016. The same analysis could be applied to all medical specialties across all drug groups for different years.
Figure 3.
Expected billing proportion of each medical specialty from drug group #22 across years (left) – Billing proportion trends of several medical specialties from drug group #22 (right).
The vast amount of information that could be captured by the model, as well as the multi-dimensionality of the analysis, could be overwhelming for a decision-maker. And this is where a temporal multicriteria anomaly detection approach could help in summarizing all this information.
3.3. Anomaly detection
One of the novelties of the proposed framework is the ability to detect hidden patterns revealed by the temporal topic model and summarize vast amount of information into simple metrics. The introduced measures are Gini score of hidden prescription patterns (Equation (2)), Gini score of cost of the prescribed drugs (Equation (3)) and one-sided time-aggregated opioid score (Equation (4)). Gini area of concentration ranges between 0 and 1, with larger values indicating larger discrepancies between the individual's prescription pattern/cost and the benchmark. The benchmark distributions are based on medical specialty, prescription year, and average beneficiaries' risk scores, as described in the previous section.
The time-aggregated distributions for these measures are shown in Figure 4. They are calculated at the year-level and then aggregated with equal weights following the weighted-average approach. Each measure captures anomalies in a particular direction with a specific focus on prescription pattern or prescription cost or over-prescription of opioids. For example, comparison of the skewness of Gini score distributions reveal more discrepancies for prescription frequencies (i.e. distribution is slightly left-skewed with higher scores). This could lead to flagging of more prescribers in terms of prescription patterns.
Figure 4.
Distribution of time-aggregated Gini scores based on prescription patterns (left), time-aggregated Gini scores based on prescription costs (middle) and time-aggregated opioid scores (right).
In addition, the proposed framework offers the ability to capture time-evolutionary aspect of prescription patterns. In particular, the standard deviation of Gini scores across years can be computed to understand the variation over time for each medical provider. Figure 5 displays time-aggregated Gini scores of prescription pattern versus their standard deviation across the 5 years. Of particular interest could be two groups. First, prescribers with high values of Gini and low standard deviation- the ones in the lower right corner of the plot. Prescribers in this group have been consistently behaving differently from their peers across all the years. This could be due to legitimate reasons such as their specialized practice. However, that could also be a sign of consistent questionable behavior. The second group of interest could be prescribers with high standard deviations. Of those, the top three are highlighted to be analyzed further.
Figure 5.

Time-aggregated Gini scores of prescription versus the standard deviation of Gini score across years. The top three providers with highest standard deviations are highlighted.
Figure 6 shows concentration curves for each of the the three highlighted prescribers (shown in Figure 5 with symbols +, ×, and *). These plots include concentration curves for aggregated prescription behavior of the prescriber of interest versus their peers of the same medical specialty across all 17 drug groups for a given year. The lines in each plot correspond to aggregate behavior of all doctors of the same medical specialty as the highlighted prescriber, and serve as the benchmark. As explained via Equation (1), for each medical provider for a given year, Gini's area of concentration score is calculated as the distance between the concentration curve and the straight line. High standard deviation between Gini score of different years, is a sign of volatility in provider's discrepancy rank across years. For example, we see that the prescriber with the highest standard deviation (an orthopedic surgeon highlighted on top row of Figure 6) had the most different prescription behavior in 2013 but a prescription behavior similar to others in 2017. The reverse pattern is observed for another orthopedic surgeon (highlighted in the second row of Figure 6) where the Gini score increases over the years. The third row highlights concentration curves for a hematologist which has high Gini scores in all the years except the low Gini score in 2015. Each of these patterns demonstrate a different behavior and could be of auditors' interest depending on their audit objectives. Different behaviors over time indicates the importance of the determination of weights of discrepancy in each year. The auditor has the flexibility to choose those weights based on experience. For instance, the Gini scores from five years ago may not be deemed as important as the current year.
Figure 6.
Concentration curves of the medical providers with highest standard deviation of Gini prescription scores across years. The bold dashed line in each row corresponds to one of the highlighted providers in Figure 5: top row corresponds to ‘+’, middle row corresponds to ‘x’, and the bottom row corresponds to ‘*’.
This analysis provides a comprehensive perspective over the years in terms of the particular criterion of interest, namely prescription frequencies. But of these doctors, which ones are also flagged in terms of the cost of prescriptions? This is why we propose a decision frontier visual tool to help with anomaly detection that can consider different measures and time aggregation simultaneously.
3.4. Decision frontiers
In practice, it is often of auditors' interest to investigate medical providers considering multiple attributes possibly aggregated over time. Therefore, we present the use of graphical decision frontier tools that further enhance the anomaly detection task.
First, we demonstrate it for combinations of two criteria using the joint threshold approach, see Figure 7. In this plot, each point corresponds to the estimated metric values for the given pair (i.e. metrics on x and y axes) for a medical provider. The decision frontier curves are drawn to indicate varying risk levels where all the points above the curves are worth of further audits with increasing urgency. Three lines corresponding to three percentiles (75th, 90th, and 95th) are drawn, wherein all plots the upper rightmost curve corresponds to the 95th percentile. The plot on the left panel of Figure 7 shows the thresholds for the two Gini measures. However, this approach can be extended to compare prescribers in terms of other measures as well.
Figure 7.
Pair-wise plots of the three different measures.
For example, as explained in Section 2.2.3, one can estimate a metric for excessive prescriptions for a certain drug type and plot either of the Gini measures against that metric. To demonstrate, we use the estimated time-aggregated opioid scores for each prescriber. Following this approach, we can analyze prescribers across different plots. We have showcased this for two medical providers, shown with square and triangle in Figure 7. Both prescribers are found to be in the top 5% considering the joint threshold with respect to prescription frequency and cost discrepancy scores, see the left panel of Figure 7. Auditors can then look at either of plots in the middle or on the right to see where these prescribers stand in terms of opioid over-prescription discrepancy. In this case, the medical provider shown with a square has one of the highest opioid score values and hence it is a good candidate for further investigation. The provider shown with a triangle, however, does not have a high opioid score indicating the absence of excessive prescriptions.
In order to further validate the results obtained using the proposed approach, we investigated the two providers that are shown with a triangle and a circle in Figure 7. The boxplots of the opioid claim rate, opioid cost rate, and opioid beneficiary rate for the prescribers in each year between 2013 and 2017, and the prescribers of interest are displayed in Figure A1 in the Appendix A.2. As noted earlier, the proposed framework takes into account hidden relationships using prescription patterns as well as cost, and covariates of medical specialty, year, and beneficiary risk score. Without directly utilizing the opioid specific variables that are widely used in industry, our framework is able to find the outliers on both ends of the spectrum. In this case, the medical provider shown with a square has one of the highest opioid score values and hence it is found to be a good candidate for further investigation. The provider shown with a triangle, however, does not have a high opioid score indicating the absence of excessive prescriptions. These results are in line with the boxplots of opioid claim rate, opioid cost rate, and opioid beneficiary rate for the prescribers.
Lastly, the approach based on individual percentile thresholds could be utilized to study each measure separately, see Figure 8. Here each rectangle includes prescribers with Gini values that are higher than the pre-specified measure threshold for both measures. While we have considered equal thresholds in drawing this plot (i.e. they are both either 75th, 90th, or 95th percentile), one can choose different thresholds for either of the criteria. This approach could be further used to detect prescribers who have completely specific behaviors for different criteria. It would be the case of a doctor whose Gini measure with respect to prescriptions would be in a very low percentile, say , whereas the one with respect to costs would be in a very high, say . This would be the case of a doctor prescribing like the entire population but charging in a completely different way, i.e. either much more or much less, for the same prescriptions. Analysis of this provider can be valuable when coupled with expert domain knowledge. For instance, this doctor could be billing more from brand drugs rather than generic drugs. Such investigation could be done graphically for two or even three criteria and numerically for any number of criteria. Overall, decision frontiers are practically convenient, comprehensive and easy to communicate tools to analyze multiple criteria simultaneously to help auditors finding investigation leads.
Figure 8.
Rectangular regions representing points above certain percentiles for either of the criteria. Three rectangles represent 75th, 90th, and 95th percentile thresholds for both of the measures.
4. Conclusion
The growing budget deficits and drug abuse health crisis increase the spotlight on prescription fraud and abuse. Analytical methods and decision tools can be used to detect anomalies that aid health care auditors. This manuscript presents an integrated statistical learning and decision framework based on multicriteria concentration curves for drug prescription anomaly decision making over time. In our illustration, we use the real world Medicare Part D prescription data from New Hampshire over five years. Structural topic models with covariates are used to retrieve temporal probabilistic drug groups. Then, anomalous prescription behavior is measured using combined multicriteria concentration curves and Gini measures considering the weighted impact of temporal observations in addition to the computing Gini measures for the cost. The decision frontier utilizes this output and enables the health care practitioner to assess the trade-offs among different criteria and identify audit leads to detect abusive prescribers. The temporal associations among providers and prescribed drugs can indicate new patterns that enable auditors discover potential cases of drug diversion over time.
The contributions of this paper are threefold. First, the impact of time is considered in construction of probabilistic clusters (i.e. drug groups), which helps to evaluate the temporal changes in prescription patterns. Second, it extends the univariate concentration functions so that multiple concentration curves can be considered. The utilized post-averaging approach computes the Gini area of concentration using individual likelihood ratios for each year, and then combines them with their weighted average. Lastly, this unsupervised modeling output is embedded into graphical decision frontiers, which considers multiple criteria. To the best of our knowledge, the proposed integrated framework is the first method that enables health care auditors to assess multiple criteria such as prescription patterns retrieved from an unsupervised learning method and cost in order to detect anomalies.
Many extensions could be considered depending on the practical objectives of the auditor. We are currently studying the different notions of multivariate Lorenz curves proposed in literature and looking at their possible use and adaptation to the case of health care. For instance, an alternative for consideration of multiple concentration functions could include a mixture method that computes a time-weighted Gini to construct one joint concentration curve. We are as well exploring other numerical and graphical tools based on ranks. The proposed integrated framework is modular in that parts of it can complement the existing industry tools and it can also be utilized in different time frames, such as monthly billings. Decision criteria could also include the cost of false positives and/or false negatives, and could be an additional aspect of the trade-off analysis with this decision frontier.
Finally, while this analysis is done for one state, further studies could follow the same approach to analyze similar data for all the states where different patterns may be revealed. In addition, we used the publicly available data, which is aggregated for identifiability purposes. For researchers who have access to detailed claimed data, one can utilize more detailed information about the characteristics of medical providers, their billing patterns, and their beneficiaries. This can further refine the results and consequently the audit leads.
Availability of data and material (data transparency)
The data used in this study is publicly available on the CMS website, as it is referenced in [13] along with its hyperlink.
Code availability (software application or custom code)
This analysis is fully coded in R programming language. Whenever needed, we can make the full code (a total of six R files) available and/or deposit it in a public repository.
Appendix.
Appendix 1
Structural topic models are generative statistical latent variable models where the objective is to find parameters and that provide the maximum likelihood of obtaining the data of drug prescriptions; , where each corresponds to an instance from the comprehensive list of drugs, indexed by . Here, is assumed to follow a logistic normal distribution with mean and variance , where is a coefficient matrix. The mean function depends on document level covariates where K−1 of its components, , is assumed to have a multivariate Normal distribution with mean 0 and a fixed variance, , for each coefficient with zero covariances. In order to do that, we construct latent variables, , corresponding to the particular drug group assigned to the nth drug billed by dth prescriber. For each nth drug prescription by the dth prescriber (i.e. ) where is the total number of drugs prescribed by the dth provider. This is sampled from a multinomial distribution over drug groups . Given the drug group , the observed drug is drawn from where each corresponds to an instance from the comprehensive list of drugs, indexed by .
In other words, calculating provides . It is modeled at the document covariate level as for each ; and . These log-transformed rate deviations from the corpus-wide background distributions are specified as functions of drug groups, of observed covariate (levels of time), and of drug group-covariate interactions. For instance, the term contains the log-transformed rate deviations for each drug group k and drug v, over the baseline log-transformed rate shared over all time periods.
The full data generating process for prescriber d, given K drug groups, observed drug prescriptions , prescriber specialty and time as covariates, is summarized as following:
The full distribution of the parameters of interest is intractable, and the inference is complicated by the non-conjugacy of the logistic Normal distribution with multinomial likelihood. Hence, the partially collapsed variational expectation–maximization algorithm of [37] is utilized in which Laplace approximations are used to estimate the expectations that are intractable due to non-conjugate portion of the model. We refer readers to their discussion for details.
The output of temporal probabilistic clusters of drug prescriptions are reported in the form of ; reported for each time period t. Whereas the topic proportions for each prescriber are shown as ; which corresponds to reported for each time period t.
Appendix 2
The boxplots of the opioid claim rate, opioid cost rate, and opioid beneficiary rate for the prescribers in each year between 2013 and 2017, and the prescribers of interest are displayed in the following Figure A1.
Figure A1.

Comparison of flagged providers of Figure 7 within their specialty (Nurse Practitioner) across 5 years.
Funding Statement
T.E. acknowledges the support of the Texas State University through Faculty Development Leave and Presidential Research Leave Award and through Heath Scholar Showcase award from Translational Health Research Center. This work was partially supported by the National Science Foundation under grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute.
Note
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Aral K.D., Güvenir H.A., Sabuncuoğlu İ., and Akar A.R., A prescription fraud detection model, Comput. Methods Programs Biomed. 106 (2012), pp. 37–46. [DOI] [PubMed] [Google Scholar]
- 2.Arnold B.C. and Sarabia J.M., Analytic expressions for multivariate Lorenz surfaces, Sankhya A 80 (2018), pp. 84–111. [Google Scholar]
- 3.Arora S., Ge R., Halpern Y., Mimno D., Moitra A., Sontag D., Wu Y., and Zhu M., A practical algorithm for topic modeling with provable guarantees, in International Conference on Machine Learning, PMLR, 2013, pp. 280–288.
- 4.Bauder R., da Rosa R., and Khoshgoftaar T., Identifying Medicare provider fraud with unsupervised machine learning, in 2018 IEEE International Conference on Information Reuse and Integration (IRI), IEEE, 2018, pp. 285–292.
- 5.Bauder R.A. and Khoshgoftaar T.M., A probabilistic programming approach for outlier detection in healthcare claims, in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2016, pp. 347–354.
- 6.Blei D.M., Ng A.Y., and Jordan M.I., Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003), pp. 993–1022. [Google Scholar]
- 7.Brownstein J.S., Green T.C., Cassidy T.A., and Butler S.F., Geographic information systems and pharmacoepidemiology: Using spatial cluster detection to monitor local patterns of prescription opioid abuse, Pharmacoepidemiol. Drug Safety 19 (2010), pp. 627–637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Capelleveen G.C., Outlier based predictors for health insurance fraud detection within US Medicaid, Master's thesis, University of Twente, 2013.
- 9.CDC , Drug overdose deaths, Centers for Disease Control and Prevention, 2018. Available at https://www.cdc.gov/drugoverdose/data/statedeaths.html [accessed 22 June 2020].
- 10.CDC , HUS 2018 trend tables, Centers for Disease Control and Prevention, 2018. Available at https://www.cdc.gov/nchs/data/hus/2018/038.pdf [accessed 3 June 2020].
- 11.Cifarelli D.M. and Regazzini E., On a general definition of concentration function, Sankhyā 0 (1987), pp. 307–319. [Google Scholar]
- 12.CMS , NHE fact sheet, The Centers for Medicare & Medicaid Services, 2020. Available at https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/NHE-Fact-Sheet [accessed 10 August 2020].
- 13.CMS , Part D prescriber data CY 2013 to 2017, Centers for Medicare and Medicaid Services, 2020. Available at https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Part-D-Prescriber [accessed 22 June 2020].
- 14.Ekin T., Statistics and Health Care Fraud: How to Save Billions, CRC Press, Boca Raton, FL, 2019. [Google Scholar]
- 15.Ekin T., Frigau L., and Conversano C., Health care fraud classifiers in practice, Appl. Stoch. Models Bus. Ind. (2021). [Google Scholar]
- 16.Ekin T., Ieva F., Ruggeri F., and Soyer R., Application of Bayesian methods in detection of healthcare fraud, Chem. Eng. Trans. 33 (2013), pp. 151–156. [Google Scholar]
- 17.Ekin T., Ieva F., Ruggeri F., and Soyer R., On the use of the concentration function in medical fraud assessment, Am. Stat. 71 (2017), pp. 236–241. [Google Scholar]
- 18.Ekin T., Ieva F., Ruggeri F., and Soyer R., Statistical medical fraud assessment: Exposition to an emerging field, Int. Stat. Rev. 86 (2018), pp. 379–402. [Google Scholar]
- 19.Ekin T., Lakomski G., and Musal R.M., An unsupervised Bayesian hierarchical method for medical fraud assessment, Stat. Anal. Data Min. 12 (2019), pp. 116–124. [Google Scholar]
- 20.Haddad Soleymani M., Yaseri M., Farzadfar F., Mohammadpour A., Sharifi F., and Kabir M.J., Detecting medical prescriptions suspected of fraud using an unsupervised data mining algorithm, DARU J. Pharm. Sci. 26 (2018), pp. 209–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Iyengar V.S., Hermiz K.B., and Natarajan R., Computer-aided auditing of prescription drug claims, Health Care Manag. Sci. 17 (2014), pp. 203–214. [DOI] [PubMed] [Google Scholar]
- 22.Johnson M.E. and Nagarur N., Multi-stage methodology to detect health insurance claim fraud, Health Care Manag. Sci. 19 (2016), pp. 249–260. [DOI] [PubMed] [Google Scholar]
- 23.Kaye A.D., Jones M.R., Kaye A.M., Ripoll J.G., Galan V., Beakley B.D., Calixto F., Bolden J.L., Urman R.D., and Manchikanti L., Prescription opioid abuse in chronic pain: An updated review of opioid abuse predictors and strategies to curb opioid abuse: Part 1, Pain Physician 20 (2017), pp. S93–S109. [PubMed] [Google Scholar]
- 24.KFF , New Hampshire drug overdose deaths, Kaiser Family Foundation's State Health Facts, 2020. Available at https://www.kff.org/state-category/health-status/?state=nh [accessed 22 July 2020].
- 25.Kose I., Gokturk M., and Kilic K., An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance, Appl. Soft Comput. 36 (2015), pp. 283–299. [Google Scholar]
- 26.Koshevoy G. and Mosler K., The Lorenz zonoid of a multivariate distribution, J. Am. Stat. Assoc. 91 (1996), pp. 873–882. [Google Scholar]
- 27.Lee S., A new preparedness policy for EMS logistics, Health Care Manag. Sci. 20 (2017), pp. 105–114. [DOI] [PubMed] [Google Scholar]
- 28.Li J., Huang K.-Y., Jin J., and Shi J., A survey on statistical methods for health care fraud detection, Health Care Manag. Sci. 11 (2008), pp. 275–287. [DOI] [PubMed] [Google Scholar]
- 29.Liu J., Bier E., Wilson A., Guerra-Gomez J.A., Honda T., Sricharan K., Gilpin L., and Davies D., Graph analysis for detecting fraud, waste, and abuse in healthcare data, AI Mag. 37 (2016), pp. 33–46. [Google Scholar]
- 30.Lu F. and Boritz J.E., Detecting fraud in health insurance data: Learning to model incomplete Benford's law distributions, in European Conference on Machine Learning, Springer, 2005, pp. 633–640.
- 31.MacLean L.C. and Richman A., Resource absorption in a health service system, Health Care Manag. Sci. 4 (2001), pp. 337–345. [DOI] [PubMed] [Google Scholar]
- 32.Marshall A.W., Olkin I., and Arnold B.C., Inequalities: Theory of Majorization and its Applications, Vol. 143, Springer, New York, NY, 1979. [Google Scholar]
- 33.Mimno D., Wallach H.M., Talley E., Leenders M., and McCallum A., Optimizing semantic coherence in topic models, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2011, pp. 262–272.
- 34.Nordmann S., Pradel V., Lapeyre-Mestre M., Frauger E., Pauly V., Thirion X., Mallaret M., Jouanjus E., and Micallef J., Doctor shopping reveals geographical variations in opioid abuse, Pain Physician 16 (2013), pp. 89–100. [PubMed] [Google Scholar]
- 35.OIG , Ensuring the integrity of Medicare Part D report (oei-03-15-00180), The Office of Inspector General, 2015. Available at https://oig.hhs.gov/oei/reports/oei-03-15-00180.asp [accessed 1 October 2017].
- 36.OIG , Examining efforts to prevent opioid overutilization and misuse in Medicare and Medicaid, Office of Inspector General, U.S. Department of Health and Human Services, 2018. Available at https://oig.hhs.gov/testimony/docs/2018/dixon-testimony052918.pdf [accessed 3 June 2020].
- 37.Roberts M.E., Stewart B.M., and Airoldi E.M., A model of text for experimentation in the social sciences, J. Am. Stat. Assoc. 111 (2016), pp. 988–1003. [Google Scholar]
- 38.Shan Y., Murray D.W., and Sutinen A., Discovering inappropriate billings with local density based outlier detection method, in Proceedings of the Eighth Australasian Data Mining Conference, Vol. 101, Australian Computer Society, Inc., Australia, 2009, pp. 93–98.
- 39.Taguchi T., On the structure of multivariate concentration-some relationships among the concentration surface and two variate mean difference and regressions, Comput. Stat. Data Anal. 6 (1988), pp. 307–334. [Google Scholar]
- 40.Whynes D.K. and Thornton P., Measuring concentration in primary care, Health Care Manag. Sci. 3 (2000), pp. 43–49. [DOI] [PubMed] [Google Scholar]
- 41.Zafari B. and Ekin T., Topic modelling for medical prescription fraud and abuse detection, J. R. Stat. Soc. Ser. C (Appl. Stat.) 68 (2019), pp. 751–769. [Google Scholar]
- 42.Zimek A., Schubert E., and Kriegel H.-P., A survey on unsupervised outlier detection in high-dimensional numerical data, Stat. Anal. Data Min. 5 (2012), pp. 363–387. [Google Scholar]







