Skip to main content
. 2017 Jul 18;24(6):1204–1210. doi: 10.1093/jamia/ocx066

Table 1.

Descriptions of studies

Author, publication year Data sources Study design, number in study sample, and time period Target population Algorithm development method Reported algorithm performance metrics
Birt et al. (2014) Prescription drug claims data from a large, commercial insurance plans (Market Scan database) Cohort; 6 291 810 patients; 2008–2009 Individual medication Compare Lorenz curves at 1% and 50% values, medication possession ratio, and proportion of days covered values for medications prone to nonmedical use with medications that are not Discriminatory ability of metric using C-statistic Concordance of medication possession ratio vs proportion of days covered, vs Lorenz 1% and Lorenz 50%
Parente et al. (2004) Prescription drug claims data from a multistate database with a mix of health plan types, including indemnity fee-for-service plans, preferred provider organizations, independent practice associations, and health maintenance organizations Cohort; 7 million patients; 2000 Patients (primarily) or prescribers Controlled substance patterns of utilization requiring evaluation system, top 10 patterns evaluated Algorithm sensitivity Pseudo R2 of model
Sullivan et al. (2010) Administrative claims data, including demographic, clinical, and drug utilization data; commercial insurance and Medicaid insurance databases were used Cohort; 31 845; 2000–2005 Patients Polytomous logistic regression; test for linear trend Criterion validity: test for linear trend
Yang et al. (2015) Prescription drug claims data from a multistate Medicaid database Cohort; 90 010 patients; 2008–2010 Patients 2 indicators: pharmacy shopping indicator, overlapping prescription indicator diagnostic odds ratio, defined as the ratio of the odds of being identified as having nonmedical use when the patient actually has nonmedical use (true positives) divided by the odds of being identified as having nonmedical use when the patient does not have nonmedical use (false positives) Criterion validity of the 2 indicators
Mailloux et al. (2010) Prescription drug claims data from Wisconsin’s Medicaid database Cohort; 190 patients; 1998–1999 Patients Decision tree Sensitivity, specificity, positive predictive value, negative predictive value, validation attempts (concordance between results and gold standard as defined)
Iyengar et al. (2014) Prescription drug and medical claims data from a large health insurance plan Cohort; 2.3 million patients; 99 000 prescribers; 2011 Patients and providers Auditing analysis Area under receiving operating curve (C-statistic)
White et al. (2009) Administrative claims data, including demographic, clinical, and drug utilization data from a private health insurance plan Case-control; 632 000 patients; 2005–2006 Patients Logistic regression C-statistic Pseudo R2Model parsimony
Rice et al. (2012) Administrative claims data, including demographic, clinical, and drug utilization data from a private health insurance plan Case-control; 821 916 patients; 1999–2009 Patients Logistic regression C-statistic
Cochran et al. (2014) Administrative claims data, including demographic, clinical, and utilization data from a large health insurance plan Case-control; 2 841 793 patients; 2000–2008 Patients Logistic regression Sensitivity
Dufour et al. (2014) Administrative claims data, including demographic, clinical, and drug utilization data from Humana and Truven Case-control; 3567 patients 2009–2011 Patients Logistic regression Sensitivity, specificity, positive predictive value, negative predictive value, validation attempts (concordance between results and gold standard as defined)
Carrell et al. (2015) Free text from electronic health records; computer-assisted manual review of clinical notes Descriptive; 22 142 patients; 2006–2012 Patients Natural language processing plus computer-assisted manual record review False positive rate that compared natural language processing plus computer-assisted manual record review approach to traditional diagnostic codes in electronic health records
Hylan et al. (2015) Free text from electronic health records Cohort; 2752 patients; 2008–2012 Patients Logistic regression Sensitivity, specificity, positive predictive value, negative predictive value C-statistic
Epstein et al. (2011) Automated drug dispensing carts and anesthesia information management systems from a hospital Case series; 158 providers 2007–2011 Providers Data mining Sensitivity, specificity, positive predictive value, negative predictive value
Ringwalt et al. (2015) Prescription drug monitoring program data from North Carolina Cohort; 33 635 providers; 2009–2013 Providers Subject matter/clinical expertise Criterion validity
Derrington et al. (2015) ICD-9-CM codes from hospital discharge data in Massachusetts Descriptive; 1 728 027 patients; 2002–2008 Patients Factor analysis Number, percentage, and 95% confidence intervals for substance use disorders identified by algorithm compared to gold standard