Abstract
A problem in biosurveillance is how frequently to update controlled vocabularies that identify various data elements such as laboratory tests and over-the-counter healthcare products. More frequent updates improve completeness of data captured over time, but introduction of new codes into a surveillance system may cause false alarms when codes are aggregated into analytic categories. We studied the effect of three policies for updating UPCs, the controlled vocabulary for over-the-counter healthcare products used by the National Retail Data Monitor.
To compare different policies for updating, we analyzed historical data from two cities for the 18 product categories of the National Retail Data Monitor under annual, quarterly, or monthly UPC update policies. We measured the effect on data completeness and false alarm rate.
We found that the monthly update policy had the highest data completeness and led to the fewest number of additional false alarms.
Overall, monthly updating of UPCs was the superior policy.
INTRODUCTION
The ability to update the controlled vocabularies used by information systems when new codes become available (e.g., a LOINC code for a new laboratory test) is important. Without this functionality, the data received are not interpretable (e.g., a message from a laboratory system to a data repository may be incomprehensible).
This problem is especially important in biosurveillance, where data exchange between information systems located in multiple organizations is commonplace. Biosurveillance systems typically group these data into categories, aggregate them geographically, and form them into time series for epidemiological analysis. For example, as new forms of testing for influenza replace existing methods, aggregate measures of influenza-testing activity may fluctuate in ways that suggest increases or decreases in disease activity that do not exist. The purpose of this study was to measure the effects on aggregate time-series data of updating the controlled vocabulary used by a biosurveillance system.
The National Retail Data Monitor (NRDM) is a biosurveillance system that collects and analyzes sales of over-the-counter healthcare products.1, 2 The NRDM organizes sales data by 18 product categories.
A product category is simply a set of similar products, such as cough syrups. The retail industry assigns every product a Universal Product Code (UPC). UPCs are thus the controlled vocabulary used by the NRDM. Manufacturers of products assign the UPC whenever they release a new product or change an existing product.
Maintenance of product categories is an important function of the NRDM.2 Retailers send sales data to the NRDM for only those UPCs specified by the NRDM, which is accomplished through a UPC Master maintained by the NRDM. Without constant updating of the UPC Master with new UPCs, daily sales counts will fall over time as new products are introduced or existing products receive new UPCs.
Ideally, the NRDM would add new UPCs to the Master prior to the release of products. However, even companies that analyze market share such as ACNielsen only learn of UPCs after they have been scanned for the first time at checkout in a retail store.
Like ACNielsen, the NRDM must add UPCs to the Master in reactive mode. To do this, the NRDM must (1) discover new UPCs, and (2) add them to the Master with minimal impact on the stationarity of the sales data. The NRDM uses the discovery processes of ACNielsen to discover new UPCs.
Adding UPCs to the Master as soon as they are received from ACNielsen (at present, the NRDM receives updates quarterly) has the advantage of capturing as many sales as possible, thus maximizing the strength of the signal available to a surveillance system. However, a concern about this policy is introducing a spike in sales of a category that either a human or a statistical outbreak detection algorithm could mistake for an outbreak.
There are potential tradeoffs to different frequencies of adding UPCs to the Master. More frequent updating of UPCs would maintain a higher level of data completeness, but cause more false alarms than less frequent updating. Quarterly updates, for example, could produce four false alarms per year whereas monthly updates could produce 12. On the other hand, monthly updating could produce fewer false alarms because it may produce spikes that are too small to cause false alarms, whereas spikes due to quarterly updates may cause false alarms.
The optimal policy is one that maximizes the signal in the surveillance data and minimizes false alarms. The exact balance between improvement in signal and false alarms would be set by considerations of the benefit due to improved ability to detect outbreaks and the cost of false alarms.
Changes in the ability to detect outbreaks is difficult to measure, so in this study we measured the effect on data completeness, which is known to influence earliness and sensitivity of detection. We use it as a surrogate for benefit. The cost of false alarms is also difficult to measure, thus in this study we measure the increase in false alarm rate (all systems have a baseline false alarm rate due to natural variability in data over time). We use these measures to compare several updating policies that differed in the frequency of updates. Our conjecture was that quarterly updates would be the superior policy given these measurements of cost and benefit.
METHODS
We measured—retrospectively—the effect of three update policies—annual, quarterly, and monthly—over a two-year period on data completeness and false alarm rate. The study was conducted using historical sales data in two metropolitan regions.
Data about New UPCs
ACNielsen provided a file of all UPCs introduced from January 2003 through September 2004 for 19 of its categories of UPCs (which differ from the NRDM’s 18 categories). The ACNielsen categories include all types of products relevant to the NRDM categories. The ACNielsen file included the date of introduction and a semi-structured description for each UPC. It also included a glossary of definitions of abbreviations used in these descriptions.
To assign each new UPC to one of the 18 NRDM categories, we applied a parser that we had developed previously to create the initial categories. The parser assigned a new UPC to a category using the ACNielsen abbreviations. When it fails to recognize an abbreviation (occasionally the abbreviation list is incomplete), it flags that UPC description for manual review. A person must then determine the meaning of the abbreviation (usually by requesting clarification from AC Nielsen), add it to the catalog of known abbreviations, and rerun the parser. Because abbreviations are both separated by spaces and may contain spaces, the process is complicated. Also, there is no clear delineation between the brand part of the description (but it always appears first) and the rest of the description, which contains attributes such as flavor, dose form, and type of medication (e.g., decongestant).
From the week ending 01/04/2003 to the week ending 09/25/2004, there were 5,498 new UPCs in ACNielsen’s 19 categories, 1,443 of which we assigned to NRDM categories. In the process of reassigning UPCs, we also found UPCs that were incorrectly assigned and thus either belonged in another category initially (76 or 1% of UPCs) or did not belong in any category (33 or 0.44%). The result was a net increase of 1410 UPCs (Table 1).
Table 1.
Categories and counts of UPCs in each
| Category | Initial Set | Diff | Final Set |
|---|---|---|---|
| Antipyretic, adult | 1340 | 291 | 1631 |
| Antipyretic, pediatric | 274 | 49 | 323 |
| Bronchial remedies | 43 | 32 | 75 |
| Chest rubs | 78 | 11 | 89 |
| Cold relief, adult, liquid | 709 | 85 | 794 |
| Cold relief, adult, tablet | 2467 | 455 | 2922 |
| Cold relief, pediatric, liquid | 323 | 74 | 397 |
| Cold relief, pediatric, tablet | 74 | 5 | 79 |
| Cough syrup, adult, liquid | 591 | 35 | 626 |
| Cough syrup, adult, tablet | 32 | 16 | 48 |
| Cough syrup, pediatric, liquid | 24 | 42 | 66 |
| Diarrhea remedies | 165 | 70 | 235 |
| Pediatric electrolytes | 76 | 40 | 116 |
| Hydrocortisones | 185 | 28 | 213 |
| Nasal remedies | 371 | 66 | 437 |
| Thermometers, adult | 315 | 64 | 379 |
| Thermometers, pediatric | 122 | −18 | 104 |
| Throat lozenges | 326 | 65 | 391 |
| Total | 7515 | 1410 | 8925 |
We analyzed the number of new UPCs per week to assess whether creation of new UPCs was more common at certain times of year or month. There was no month/quarter of year or week of month that had consistently higher numbers of new UPCs.
Data about UPC Sales
We also obtained from ACNielsen a historical dataset of weekly counts of sales of products in the Philadelphia and Pittsburgh markets from 2002 to 2004. The Philadelphia market as defined by ACNielsen covers 24 counties in southeastern Pennsylvania, southern New Jersey, and northern Delaware. According to census data, this area has a population of approximately 8.6M. Similarly, the Pittsburgh market covers 35 counties in southwestern Pennsylvania, eastern Ohio, northern West Virginia, and western Maryland. The census population estimate for this area is 3.4M. This dataset includes sales from drug, grocery, and mass (e.g., Target®) retailers. Estimated market share coverage is 90%.
Reference Time Series
We created a two-year reference time series for both regions by adding UPC to categories the moment they were introduced. We used this time series as a standard against which we compared time series created by the three update policies under study.
To eliminate the potentially confounding effects of the 76 UPCs reassigned to a different category and 33 UPCs reassigned to no category, we built the time series as if these changes were in effect from the outset. The reason was that we were more concerned with future maintenance of the NRDM than incompleteness of sales data it captured in the past.
Policy Time Series
For each policy—never, annually, quarterly, and monthly—we created time series from the two datasets by adding new UPCs to categories with the frequency indicated by the policy. Including the two reference time series, there were 10 time series total.
Measurements
Data completeness
We measured data completeness as the ratio of weekly sales counts in a policy time series to the weekly sales counts of the reference time series. We plot this ratio over time and also report the ratio for counts summed over all the weekly counts in the time series by category.
False alarm rate
We created a detection algorithm that was sensitive to large differences in sales between the week of updating and the week prior to updating. We designed the algorithm in this manner because we wanted to evaluate the worst-case scenario of increases in false alarms for each policy. We trained the algorithm on data from non-update weeks, computing the mean and standard deviation (SD) of the differences in sales from one week to the next. We then ran the algorithm on update weeks to determine how many signaled an alarm.
We used a threshold of >3.0 SD. To measure the effect of increasing significantly the sensitivity of the algorithm for detecting changes in sales on update weeks, we also measured the FAR at a threshold of >1.0 SD.
RESULTS
Effect of policies on data completeness
In the absence of any UPC updates, data completeness for total sales (over all 18 categories) dropped to 0.83 in Philadelphia and 0.82 in Pittsburgh over the two-year period (Figure 1). The category with the highest data completeness in the absence of UPC updating is Bronchial remedies, with a value of 1.0 in both cities. The categories with the lowest data completeness are Cough syrup, adult, tablet in Philadelphia at 0.45 and Cough syrup, pediatric, liquid in Pittsburgh at 0.50 (Table 2).
Figure 1.
Data Completeness with No UPC Updates over all 18 Categories.
Table 2.
Data Completeness and Differences in False Alarms by Category
| Data Completeness by Update Frequency Philadelphia/Pittsburgh | Change in FAR* (>3.0 SD/>1.0 SD) | ||||||
|---|---|---|---|---|---|---|---|
| Category | Never | Annual | Quarterly | Monthly | Ann | Quart | Mon |
| Antipyretic, adult | 0.83/0.84 | 0.951/0.950 | 0.985/0.983 | 0.996/0.995 | 0/0.5 | 0/0.5 | 0/0 |
| Antipyretic, pediatric | 0.82/0.81 | 0.974/0.974 | 0.993/0.992 | 0.998/0.996 | 0/0 | 0/0 | 0/0 |
| Bronchial remedies | 1.0/1.0 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0/0 | 0/0 | 0/0 |
| Chest rubs | 0.80/0.79 | 0.967/0.968 | 0.999/1.00 | 0.999/1.00 | 0/0 | 0/0 | 0/0 |
| Cold relief, adult, liquid | 0.91/0.92 | 0.982/0.982 | 0.996/0.995 | 0.998/0.998 | 0/0 | 0/0 | 0/0 |
| Cold relief, adult, tablet | 0.83/0.83 | 0.952/0.953 | 0.991/0.986 | 0.996/0.994 | 0/0.5 | 0/0.5 | 0/0 |
| Cold relief, pediatric, liquid | 0.91/0.92 | 0.977/0.976 | 0.996/0.996 | 0.997/0.997 | 0/0 | 0/0 | 0/0 |
| Cold relief, pediatric, tablet | 0.88/0.86 | 0.979/0.977 | 0.994/0.993 | 0.999/0.999 | 0/0 | 0/0 | 0/0 |
| Cough syrup, adult, liquid | 0.97/0.96 | 0.989/0.987 | 0.997/0.995 | 0.998/0.997 | 0/0 | 0.5/0 | 0/0 |
| Cough syrup, adult, tablet | 0.45/0.61 | 0.850/0.898 | 0.996/0.997 | 0.997/0.998 | 0.5/0.5 | 0.5/0.5 | 0/0 |
| Cough syrup, pediatric, liquid | 0.61/0.50 | 0.942/0.928 | 0.996/0.995 | 0.999/1.00 | 0.5/0.5 | 0/0 | 0/0 |
| Diarrhea remedies | 0.85/0.86 | 0.948/0.946 | 0.984/0.983 | 0.987/0.987 | 0.5/0 | 0/0 | 0/0 |
| Pediatric electrolytes | 0.82/0.73 | 0.985/0.975 | 0.997/0.995 | 0.998/0.999 | 0/0 | 0/0 | 0/0 |
| Hydrocortisones | 0.87/0.85 | 0.963/0.961 | 0.994/0.992 | 0.999/0.997 | 0/0 | 0/0 | 0/0 |
| Nasal remedies | 0.92/0.93 | 0.983/0.984 | 0.997/0.997 | 0.997/0.998 | 0/0 | 0/0 | 0/0 |
| Thermometers, adult | 0.89/0.94 | 0.981/0.98 | 0.993/0.991 | 0.996/0.995 | 0/0 | 0/0 | 0/0.5 |
| Thermometers, pediatric | 0.80/0.60 | 0.98/0.970 | 0.994/0.998 | 0.998/0.998 | 0/0 | 0/0 | 0/0 |
| Throat lozenges | 0.75/0.74 | 0.949/0.948 | 0.994/0.994 | 0.997/0.997 | 0/0 | 0/0 | 0/0 |
| Total | 0.83/0.82 | 0.960/0.959 | 0.991/0.989 | 0.996/0.995 | 1.5/2.0 | 1/1.5 | 0/0.5 |
In alarms per year. Philadelphia only.
The policy of annual UPC updates resulted in an overall completeness of 0.960 in Philadelphia and 0.959 in Pittsburgh (Figure 2).
Figure 2.
Effect of Update Policy on Data Completeness.
The policy of quarterly UPC updates resulted in an overall completeness of 0.991 and 0.989, and monthly UPC updates resulted in an overall completeness of 0.996 and 0.995 (Figure 2).
Effect of policies on false alarm rate
Annual updates increased the false alarm rate (FAR) by 1.5 alarms/year at a threshold of >3.0 SD, and by 2.0 alarms/year at a threshold of >1.0 SD.
Quarterly updates increased the FAR at a threshold of >3.0 SD by 1.0 alarms/year: one false alarm in two years for the Cough syrup, adult, liquid category and one in two years for the Cough syrup, adult, tablet category (Table 2). At a threshold of >1.0 SD, quarterly updates increased the FAR by 1.5 alarms/year.
Monthly updates did not cause false alarms for thresholds of >3.0 SD: every alarm in the monthly series also occurred in the reference series. For a detection threshold of >1.0 SD, monthly updates increased the FAR by 0.5 alarms/year: one additional alarm for the Thermometers, adult category.
For Cough syrup, adult tablet—one of the two categories for which an additional false alarm occurred in the quarterly update time series, we plotted the time series of counts of sales for quarterly updates vs. its reference time series (Figure 3). The differences between the two time series are overwhelmed by the natural variance of sales. The update week at which the extra alarm occurred (10/04/03) is during a time at which the two time series are visibly separated (roughly 09/04/03 through 11/04/03).
Figure 3.
Sales of Category Cough syrup, adult, tablet with Quarterly Updates vs. Its Reference Time Series.
DISCUSSION
If UPCs are not updated, data completeness of OTC sales for the 18 product categories of the NRDM declines by approximately 9 percent per year. The similarity of curves for Philadelphia and Pittsburgh suggests that the result is general. These two cities have different grocery store chains, different market shares of retailers that exist in both regions, and different population demographics.
However, the retail chains that service those two cities are still more similar to each other than to retailers that service cities in other regions of the United States. The same is true with respect to demographics. It is possible, although unlikely, that the decline in data completeness in other locations differs significantly from what we observed.
Monthly updates had the lowest cost in terms of false alarms. The differences between different update policies were minimal, but monthly updates had the smallest effect—one additional false alarm over two years at the most sensitive detection threshold, whereas quarterly and annual updates generated an additional 3 and 4 alarms, respectively.
Monthly updates were the superior update policy when taking into account only considerations of false alarm rate and data completeness. It had both the highest data completeness (our surrogate for earliness and sensitivity to outbreaks of an alerting system) and lowest increase in false alarm rate. The reason that monthly updates produced such a small number of additional alarms was that they produce a spike in sales that is small relative to natural variability in sales from week to week.
The plot of quarterly updates vs. reference time series (Figure 3) of the Cough syrup, adult, tablet category—one of the few categories for which there were alarms on update weeks—shows that there are only slight differences between the time series. Also, there were 9 update weeks with 1.0 ≥ SD > 0.9 in the reference time series for this category. If the median increase in sales due to quarterly updates were as high as 0.1 SD of the changes in sales from week to week, we would expect 4–5 extra alarms at a threshold of 1.0 SD, but we observed just one. Thus, the additional increase (or lesser decrease) in sales due to even quarterly updates is likely to be <0.1 SD.
The limitations of this study are first, that we studied weekly sales counts (because ACNielsen has historical sales data only at that level of temporal granularity), but the NRDM monitors daily counts. With weekly counts, the increase in sales due to UPC updates might be spread out over as many as seven days if a UPC were first scanned on the last day of the week. The step up in daily sales due to UPC updates is thus likely to be higher. Updates might therefore cause more false alarms on time series of daily sales, which could change the preferred policy.
Second, we studied only one detection algorithm. Even though it was designed to detect sudden increases due to updates and thus provide worst-case results, other algorithms may exist that can extract greater signal from time series of OTC sales (by detecting gradual increases or by analyzing other data such as weather, air quality, etc.)—and thus reduce the variance of sales counts. Such algorithms may be more sensitive to UPC updates, potentially changing the policy with the lowest increase in false alarm rate.
Third, we studied counts aggregated over two large regions. We cannot extrapolate our results to algorithms that monitor time series for smaller regions such as counties or zip codes. The variance of the time series of smaller regions is usually higher than variance of time series aggregated over the smaller regions, which would lead one to predict a smaller effect on false alarms. However, the variability of the effect of updates on changes in sales from week to week is also likely to be higher in smaller regions, and thus the effect on false alarms could be greater. Furthermore, data completeness is likely to be more variable at the level of zip code. The net effect of aggregation on the tradeoff between data completeness and false alarms is thus uncertain.
A fourth limitation is that we studied the effect of update policies on only one aspect of outbreak detection performance—the false alarm rate—and used data completeness as a surrogate for earliness and sensitivity. However, since decreased completeness is likely to decrease the sensitivity and timeliness of outbreak detection, it is encouraging that the policy that maximizes data completeness is also the policy that minimizes false alarms.
Had quarterly updates demonstrated a lower false alarm rate than monthly updates, it would have been necessary to determine whether the higher data completeness with monthly updates was worth the cost of additional false alarms when setting the update policy for the NRDM. Ultimately, the utility function for this decision involves the cost of false alarms and the cost of delayed detection. Given the high data completeness and low increases in false alarm rate for both policies, it seems unlikely that one policy would have been strongly preferred.
To our knowledge, this is the first study to document how completeness of biosurveillance data degrades over time if the controlled vocabulary is not updated. It also is the first to measure the effect of various update policies on data completeness and false alarms. Therefore, this study is a reference point for future studies on these topics.
The methods from this study are general and could also be applied to studying various policies for updating the controlled vocabularies used for other types of biosurveillance data. It is unlikely that the tradeoff we studied is unique to monitoring OTC sales. Surveillance of lab testing, diagnoses, and prescription drugs all involve category monitoring and new things are introduced (e.g., new tests, ICD-9 codes, and drugs) frequently and on an ongoing basis.
CONCLUSION
When UPCs are not updated, data completeness of OTC sales data degrades significantly over a period of two years. Monthly updating of UPCs appears to be superior to annual or quarterly updates both in terms of data completeness and minimizing false alarms. This study introduces general methods that may be applied to similar problems in updating controlled vocabularies in biosurveillance systems.
ACKNOWLEDGEMENTS
This work was supported by Pennsylvania Department of Health Award number ME-01-737, Grant F 30602-01-2-0550 from the Defense Advanced Research Projects Agency, and grant 1 R21 LM008278-01 from the National Library of Medicine.
REFERENCES
- 1.Wagner M, Espino J, Hersh J, et al. National Retail Data Monitor for public health surveillance. MMWR. 2004;53(September 24 Supplement):40–42. [PubMed] [Google Scholar]
- 2.Wagner M, Robinson J, Tsui F, Espino JU, Hogan W. Design of a national retail data monitor for public health surveillance. J Am Med Inform Assoc. 2003;10(5):409–418. doi: 10.1197/jamia.M1357. [DOI] [PMC free article] [PubMed] [Google Scholar]



