Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2023 Apr 15:1–30. Online ahead of print. doi: 10.1007/s00146-023-01654-9

Using AI to detect panic buying and improve products distribution amid pandemic

Yossiri Adulyasak 1, Omar Benomar 2, Ahmed Chaouachi 2, Maxime C Cohen 2, Warut Khern-am-nuai 3,
PMCID: PMC10105357  PMID: 37358947

Abstract

The COVID-19 pandemic has triggered panic-buying behavior around the globe. As a result, many essential supplies were consistently out-of-stock at common point-of-sale locations. Even though most retailers were aware of this problem, they were caught off guard and are still lacking the technical capabilities to address this issue. The primary objective of this paper is to develop a framework that can systematically alleviate this issue by leveraging AI models and techniques. We exploit both internal and external data sources and show that using external data enhances the predictability and interpretability of our model. Our data-driven framework can help retailers detect demand anomalies as they occur, allowing them to react strategically. We collaborate with a large retailer and apply our models to three categories of products using a dataset with more than 15 million observations. We first show that our proposed anomaly detection model can successfully detect anomalies related to panic buying. We then present a prescriptive analytics simulation tool that can help retailers improve essential product distribution in uncertain times. Using data from the March 2020 panic-buying wave, we show that our prescriptive tool can help retailers increase access to essential products by 56.74%.

Keywords: Anomaly detection, Retail operations, Panic buying, COVID-19

Introduction

The COVID-19 pandemic triggered a wave of panic-buying behavior worldwide at the end of Q1 2020 (e.g., Furutani 2020; Lufkin 2020; Settanni 2020). Consumers rushed to grocery stores and stockpiled large amounts of essential supplies, such as face masks, hand sanitizers, canned food, and toilet paper. Retailers were naturally not prepared for such behavior; consequently, many essential products quickly stocked out at most point-of-sale locations, such as grocery stores and pharmacies (Edmiston 2020). By the time retailers realized the extent of the situation, it was unfortunately too late to react.

Panic buying in most product categories causes short-term shocks and false demand signals that may result in supply chain disruptions and adversely affect long-term profitability. This is due to the fact that customers stockpile these products without intending to consume larger quantities than usual. In the wake of the consequences of the panic-buying wave at the end of Q1 2020, retailers are eager to find a solution. The primary objective of this paper is to develop a framework that can systematically alleviate this issue by leveraging artificial intelligence (AI) models and methods. Since such panic-buying behavior cannot be easily predicted because it is a rare event, our goal is to deploy AI tools to promptly detect panic-buying behavior and strategically react. Specifically, our proposed framework can help retailers identify situations in which sales patterns become abnormal, constituting demand anomalies in real time. Furthermore, for each detected anomaly, the model provides a concrete, tangible interpretation that can help retailers understand the reasons behind the anomaly. Once the demand anomaly is detected, retailers can strategically react using the following levers: (i) implementing a rationing policy that imposes a limit on the number of items each customer can purchase and (ii) adapting the replenishment strategy to cope with the demand surge. Understanding when to activate a rationing policy, for which categories of products, and for how long it can have a significant impact on both retailers and consumers (especially given that we will likely experience several future waves of panic buying). In particular, when implemented properly, such rationing policies can broaden the distribution of essential products to a large number of customers (without incurring any additional costs to the retailer). Interestingly, many retailers implemented such rationing policies in 2020 and 2021. However, the specifics of these policies were often based on intuition and emotional reactions, as opposed to being data driven. In our conversations, several retailers acknowledged the lack of tools at their disposal to guide the deployment of rationing policies. In addition, these rationing policies were often implemented too late, as stated in several anecdotal quotes, such as the following: “What we learned was we didn’t impose product restrictions early enough, and that created a run on the system and created some difficulties for people.”1

In this paper, we collaborate with a large retail grocery chain in North America to develop a real-time anomaly detection system (which leverages unsupervised and supervised machine learning models) that can identify and flag pertinent anomalies related to panic buying. Specifically, we propose a two-stage machine learning approach that allows us to incorporate various internal data sources (e.g., promotions, sales, inventory, store traffic, and online transactions) as well as external data (e.g., social media, Google Trends) to detect anomalies. Such external data can help signal potential anomalies as they occur, which will ultimately provide enough time for retailers to react. We obtain a strong validation that our approach successfully detects anomalies for the right period and that virtually no anomalies are detected during periods where we do not expect to have them. Our anomaly detection model differs from traditional methods in that we adapt the definition of anomalies based on specific business requirements. More precisely, instead of identifying anomalous patterns on a purely statistical basis, we search for instances that have concrete undesired consequences (e.g., triggering a stockout in the near future). To do so, we propose to use a two-stage AI model in which the first stage identifies anomalies and the second stage classifies those anomalies as pertinent or non-pertinent. The second stage has the flexibility to handle different types of anomalies, depending on the managerial context under consideration. The novelty of this paper lies in its application to business contexts, especially regarding how the detected anomalies are classified based on business requirements. Our two-stage model offers flexibility to retailers to detect several types of anomalies based on their business requirements and preferences. We also present an end-to-end real-world application of the deployment of our data-driven model in collaboration with one of the largest retail grocery chains in North America. We apply our model to three categories of products—toilet paper, canned soup, and household cleaners—during the COVID-19 pandemic (January–May 2020). Our data comprise more than 15 million observations. We first showcase the accuracy of our anomaly detection algorithm compared to several benchmarks. We then present a scenario analysis of the impact of imposing different versions of a rationing policy. Our results suggest that by implementing the right rationing policy at the right time, stockouts could be entirely avoided (or, at least, significantly mitigated), and access to essential products could be granted to 56.74% additional customers. Finally, we leverage our model (estimated using data from the first COVID-19 panic-buying wave at the end of Q1 2020) to simulate the efficacy of our prescriptive tool in a future panic-buying event (either triggered by the pandemic or by other worrisome events). Via extensive simulations, we showcase how our AI-based framework can help retailers improve essential product distribution.

Relevant literature and theoretical background

In this section, we first survey prior studies that are closely related to this paper. We then discuss the theoretical background that underpins our study, specifically on the drivers of panic-buying behavior and the resulting consequences.

Related literature

In this subsection, we survey prior relevant literature. We first review the extensive studies in data-driven retail operations. We then review several existing papers on anomaly detection models and how these models are used in the context of supply chain management. Following that, we discuss several papers that leverage external data, such as social media content, to augment predictive tasks. Lastly, we survey prior works that study the social impact of artificial intelligence.

Data-driven retail operations

In recent years, retail managers have increasingly integrated big data analytics into operations (Choi et al. 2018; Fisher and Raman 2018). In this context, data-driven supply chain management has become a practice that is highly sought after by many firms (Sanders and Ganeshan 2018). This trend is supported by the previous literature, which has consistently demonstrated the value of data in improving firms’ operations, including supply chain configuration (Wang et al. 2016), safety stock allocation (Hamister et al. 2018), risk management (Choi et al. 2016), and promotion planning (Cohen et al. 2017). With the advances in artificial intelligence in recent years, techniques such as deep learning have been increasingly adopted to improve several aspects related to retail operations, such as procurement (Cui et al. 2022), demand forecasting (Birim et al. 2022), inventory replenishment (Qi et al. 2020), and risk management (Wu and Chien 2021). This paper extends the literature in this research stream in the wake of the COVID-19 pandemic, which imposes significant burdens on supply chains and product distribution (e.g., Armani et al. 2020; Ivanov and Dolgui 2020). Specifically, we propose a data-driven framework that can systematically identify demand anomalies for grocery products. We then classify these anomalies based on managerial requirements (e.g., whether these anomalies are pertinent according to a specific managerial definition). Using the identified anomalies, we then develop a prescriptive tool to showcase the implications of our method in terms of essential product distribution. Our data-driven model can help retailers strategically decide when to activate a rationing policy, for which stores and categories of products to do so, and the appropriate limit value (e.g., one versus two items per customer).

Our paper is also related to the nascent literature on retail operations amid the COVID-19 pandemic (Tsyganov 2021). Several papers have focused on supply chain resiliency (e.g., Ivanov and Dolgui 2020; Sodhi and Tang 2020). Han et al. (2020) consider the impact of the pandemic on e-commerce sales. While most previous studies take a descriptive or predictive approach, our paper adopts a data-driven prescriptive approach to enhance future decisions, namely, how retailers can detect panic-buying events and strategically react.

Anomaly detection

Anomaly detection has been extensively studied in the statistics, computer science, and machine learning communities (see, e.g., Mehrotra et al. 2017, and the references therein). Recent anomaly detection models have taken advantage of advances in machine learning and deep learning methods to identify patterns of anomalies based on multivariate inputs. For a recent comprehensive literature review on anomaly detection models that rely on deep learning methods, see Chalapathy and Chawla (2019). Applications of these anomaly detection models include the identification and detection of fraud (e.g., Paula et al. 2016), cyber-intrusion (e.g., Hong et al. 2014), and medical anomalies (e.g., Salem et al. 2013; Sabic et al. 2021). In the focal context of this study, several prior papers have used anomaly detection algorithms for retail demand anomaly detection and empirically demonstrated that existing algorithms perform reasonably well in detecting anomalous demand patterns (e.g., Liu et al. 2012, 2008).

Although the concept of demand anomaly detection is far from new, virtually all prior studies have detected demand anomalies from a purely statistical perspective (e.g., identifying demand observations that are significantly different relative to “normal” patterns). They do not account for business requirements (i.e., how anomalies are defined from a business standpoint) in the detection process. This discrepancy has important implications because the implementation of cutting-edge technologies without taking business requirements into account can generate severe consequences for businesses (Yeoh and Koronios 2010). In this paper, we present an anomaly detection model that explicitly accounts for managerial requirements. While our model was motivated by the COVID-19 pandemic disruptions, it remains flexible enough to ensure that it can be generalized to non-pandemic scenarios. In summary, the novelty of this paper lies in its application to business contexts, especially in how the detected anomalies are classified based on business requirements. As such, we have developed a two-stage model that includes a meaningful interpretation and offers flexibility to retailers to detect several types of anomalies based on their business requirements and preferences.

Leveraging external data

This paper is also related to the stream of literature that utilizes external data sources, such as social media and news articles, for identification, exploration, and prediction tasks. Conceptually, several prior studies have demonstrated that social media tends to capture public opinion (Anstead and O’Loughlin 2015). Indeed, the content shared on social media tends to be generated from diverse information sources (Bakshy et al. 2015). As a result, social media data are widely used for prediction purposes in a wide range of applications. For example, Chen et al. (2014) demonstrate that investors’ opinions transmitted via a social media platform are significantly useful for predicting future stock returns and earnings surprises. Similarly, Tanlamai et al. (2022) develop a machine learning model to identify arbitrage opportunities in a retail market. The authors show that the predictive performance of the model significantly increases when external data, such as online user reviews and online questions and answers, are included as model features. Using social media data to help predict sales of specific products is also common in the literature (e.g., Gaikar and Marakarkandy 2015).

In the context of this paper, of which the focus is the COVID-19 pandemic, there exist several prior studies that demonstrate the potential usefulness of external data in predictive tasks. For instance, Samaras et al. (2020) empirically studied the role of Google Trends and Twitter in predicting the weekly number of cases of influenza in Greece. The authors show that predictive models that are primarily built based on these data perform well, with a mean absolute percentage error of 18.74% when Twitter is the main source and 22.61% when Google Trends is the main source. In the same vein, Qin et al. (2020) use social media search indexes for COVID-19 symptoms, such as dry cough, fever, and chest distress, in early 2020 to predict the number of COVID-19 cases. The authors employed five different predictive models to empirically show that social media search indexes are an effective early predictor of the number of infections.

The prior studies referred to above have shown that external data, such as social media, can be useful in capturing public opinion, which can ultimately be used to predict public movements. Building upon this empirical evidence, we incorporate external data to complement the use of internal data for anomaly detection in retail demand. More precisely, we leverage external data sources to classify demand anomalies amid the COVID-19 pandemic into pertinent anomalies that have adverse retail consequences (e.g., triggering stockouts in the near future).

Social impact of artificial intelligence

AI has been widely used in several areas of society over the past decade. The use of AI in business and management tends to have direct implications for firms’ profits and consumer welfare (e.g., Cohen et al. 2022b; Khern-am-nuai et al. 2022). Meanwhile, the public sector also utilizes AI in several functions, such as in the legal system (e.g., Cohen et al. 2022a, 2023) and in policy making (van Noordt and Misuraca 2022). This raises several concerns over the social impact of AI, especially regarding how AI interacts with people (Shin 2023), how people perceive the credibility of AI (Shin 2022), and user awareness of privacy in AI-mediated environments (Shin et al. 2022b). Similarly, consumers, businesses, and policymakers have all expressed concerns over the fairness of decisions made based on AI predictions (Shimao et al. 2022; Shin et al. 2022a). The current paper connects to this stream of research by proposing an AI model that can help retailers improve the distribution of essential products during periods when demand anomalies occur (e.g., during the panic-buying periods at the beginning of the COVID-19 pandemic). The improved product distribution brought about by our AI model has positive implications for consumer surplus and social welfare; thus, it represents an application of AI that is potentially of great benefit to society.

Theoretical background

This research is informed by prior studies that have proposed theories on the drivers of panic-buying behavior and the resulting reactions of retailers. In behavioral operations, issues surrounding panic buying have attracted significant attention from researchers (e.g., Croson et al. 2014; Sterman and Dogan 2015). Panic buying, which is often defined as consumers’ attempt to accumulate more goods than they actually need, is triggered by various factors. For example, Li et al. (2021) empirically demonstrate the relationship between individuals’ risk aversion and tendency to panic buy. Other drivers include sociological factors such as herding (Naeem and Ozuem 2021) and psychological factors such as anxiety (Lins et al. 2021). In the context of the COVID-19 pandemic, the primary factor discovered in the literature is the perceived uncertainty of available future supply, which can have implications for personal well-being (see Arafat et al. 2020 for a detailed discussion on the psychological underpinnings of panic buying during the COVID-19 pandemic). The implications of such panic-buying behavior can range from mild inconvenience to destabilization of the entire supply chain (Perera et al. 2020).

In the retail sector, retailers have been generally aware of the possibility of panic-buying events and their potential catastrophic consequences for almost a century (Mitchell 1924). However, most retailers respond to panic-buying behavior retroactively and heuristically (Prentice et al. 2020). For instance, retailers usually delegate the responsibility for detecting and mitigating panic-buying waves to store managers (Arumita 2020) without relying on rigorous data-driven capabilities. As a result, there exists a large discrepancy between the ability to detect panic-buying events and the policies deployed to discourage or mitigate such events (Chakraborti and Roberts 2020). From a pandemic management perspective, this discrepancy can have troublesome implications. First, inconsistent policies can confuse consumers during an already stressful and uncertain time. Second, these inconsistencies may encourage consumers to further mobilize and explore (e.g., to find stores with a more lenient policy), which could contribute to exacerbating the spread of the virus. These implications call for a systematic, data-driven approach that can identify and mitigate panic-buying behavior.

Motivated by the above shortcomings regarding how panic-buying events are handled in practice, this work considers the following research question: Can we develop a practical AI framework to systematically identify and mitigate panic-buying behavior? Recent literature has shown that panic buying tends to generate unexpected and extreme spikes in observable demand (Zheng et al. 2021). We treat such events as anomalies. We then identify and adapt existing machine learning models that can effectively and consistently detect these anomalies. We collaborate with a retail partner to move beyond the standard statistical definitions of these anomalies and to incorporate business implications into the definition. Lastly, we discuss systematic and semi-automatic processes that can be triggered following the detected anomalies to mitigate the potential impact of panic-buying behavior.

Data and empirical context

In this research, we collaborate with one of the largest grocery retail chains in North America. The company manages multiple brands of grocery stores and records billions of dollars in annual revenue. Through this collaboration, we were able to access comprehensive point-of-sale transaction data from 42 stores located in a large metropolitan city. Our dataset spans January 1, 2018 to May 1, 2020. Using this dataset, we first examine the effect of the COVID-19 pandemic on several product categories. Recall that the first wave of COVID-19 in North America in March 2020 had a strong impact on grocery store sales. For example, Statistics Canada reports that the average increase in grocery sales relative to the previous year was approximately 45% among Canadian grocery stores (see Fig. 1). In this context, in Q1 of 2020, there was an intense panic-buying wave in several countries around the world. Customers rushed to grocery stores to purchase large quantities of essential products, such as face masks, hand sanitizers, canned food, and toilet paper.

Fig. 1.

Fig. 1

Grocery sales in Canada (source: Statistics Canada)

We use our data to plot the total number of sales per product category on a weekly basis. We then identify three product categories that were significantly affected by the COVID-19 pandemic: toilet paper (398 products), canned soup (255 products), and household cleaners (554 products).2 We plot the (normalized) weekly transaction volume for toilet paper in Fig. 2. Similar plots for canned soup and household cleaners can be found in Appendix A.1.

Fig. 2.

Fig. 2

Weekly toilet paper sales from January to May 2020 across 42 stores (the first week on the x-axis corresponds to the week of January 5). The y-axis is normalized for anonymity

For all three product categories, we observe a striking spike in the volume of sales around mid-March (i.e., weeks 10 to 12 on the x-axis). Notably, the number of weekly transactions in week 11 doubles—or even triples—relative to weeks 1–8 for all three categories. This increase is significantly higher relative to the reported average increase in grocery sales due to COVID-19. This led us to select these three product categories as our focal categories in this paper. In total, we have 15,005,425 in-store transactions related to the sales of these 3 product categories in the 42 stores.

In addition to point-of-sale data, we also independently collect data from external sources related to the COVID-19 pandemic. Our objective is to utilize these data sources to augment the prediction accuracy of our models. Specifically, we collect data from the following three external sources3:

  1. First, we collect the data from Google Trends searches. In particular, we record the number of searches for the term “COVID-19.” In addition, we record the number of searches that are related to the product categories under consideration (i.e., we collect the daily search trend for the terms “toilet paper,” “household cleaners,” and “canned soup”). We limit the collection scope to searches within the metropolitan area of the city in our dataset between January 1 and May 1, 2020.

  2. Second, we collect data from the social networking platform Twitter. Specifically, we collect tweets that contain one of the common COVID-19 hashtags defined by Lamsal (2020), as well as tweets that are related to the product categories under our consideration. In other words, we collect tweets that either contain one of the common COVID-19 hashtags or mention toilet paper, household cleaners, or canned soup in the content or both. Similar to our Google Trends data collection, we limit the scope to tweets within the metropolitan area of the city in our dataset between January 1 and May 1, 2020. Based on this collected data, we calculate the volume of the tweets as well as their sentiment.

  3. Third, we collect news article data from three local news outlets that operate in the same metropolitan area as the city in our dataset. These three news outlets are the top three local news outlets based on their Google search ranks. We then collect all the news articles that mention either the term “COVID-19” or one of the product categories under consideration or both between January 1 and May 1, 2020. Based on this dataset, we calculate the volume of the news articles and their sentiment.

We collect these external data using an automatic script that runs on a daily basis. The sentiments of the tweets and news articles are analyzed using the VADER (Valence Aware Dictionary and sEntiment Reasoner) package in Python (Gilbert and Hutto 2014), which is widely used in the literature to analyze the sentiment of textual content in media and social media (e.g., Ilk et al. 2020). The package analyzes the input content and produces four output sentiment scores: positive, neutral, negative, and compound. The first three scores range between 0 and 1 and represent the proportion of positive content, neutral content, and negative content, where the sum of the three scores is 1. Meanwhile, the compound score, which is the score that we use in our study, normalizes the three scores into a single score. The compound score ranges from 0 (extremely negative) to 1 (extremely positive). We have a total of 337 tweets and 484 news articles that fit our selection criteria.4

To illustrate the trend of the external data we collected, we plot the volume of the search terms on Google Trends that are related to toilet paper in Fig. 3. Interestingly, we observe that the peak of toilet paper sales driven by the COVID-19 pandemic on March 13, 2020 (see Fig. 2) coincides with the peak in Fig. 3. In addition, Fig. 4 plots the volume of tweets and news articles that contain the terms “COVID-19” and “toilet paper.” Once again, a similar trend can be observed. Overall, these two figures provide strong evidence that external data can be useful in the context of the predictive tasks studied in this paper. Similar figures for canned soup and household cleaners are available in Appendix A.2.

Fig. 3.

Fig. 3

Google Trends data for toilet paper search terms

Fig. 4.

Fig. 4

Daily volume data for tweets and news articles related to the toilet paper category

AI model for demand anomaly detection

The objective of our demand anomaly detection model is twofold. First, we aim to develop a flexible framework that is applicable to the COVID-19 pandemic context as well as to other scenarios where demand anomalies may occur (e.g., natural catastrophes such as hurricanes and severe storms). Second, given the emergency and criticality of the impact that the COVID-19 pandemic poses for the distribution of essential products, we also ensure that our model can be tailored to the specific business requirements of the COVID-19 context. We resolve this dilemma by developing a two-stage model with a flexible framework. The first stage (labeled anomaly detection) applies an unsupervised anomaly detection model to identify anomalies in sales transactions. The second stage (labeled anomaly labeling or pertinent classification) is tailored to the COVID-19 situation by taking managerial requirements into account to classify the detected anomalies from the first stage into pertinent anomalies from a business perspective. As discussed, this stage also incorporates external data sources to improve the classification performance. The second stage is modular in the sense that it can be adapted to various business contexts, depending on the desired definition of an anomaly. We summarize the design and function of our two-stage model in Fig. 5. The first stage relies on unsupervised machine learning (ML) methods since data labels are not available in our training dataset, whereas the second stage is based on supervised methods. The overall ML pipeline architecture is presented in Appendix C, whereas the specifics of the ML methodologies we used are relegated to Appendix D.

Fig. 5.

Fig. 5

Design of our two-stage model

First-stage model: anomaly detection

Our first-stage model aims to identify sales observations that are anomalous (e.g., large sharp increases). We construct our dataset using a sliding window mechanism (Sejnowski and Rosenberg 1987) to create a time-series dataset. Specifically, we use a window of 3 h that comprises the aggregated observations of 36 data points, each with a 5-min interval. Figure 6 illustrates the construction process of our data. The detailed step-by-step feature preparation process is available in Appendix E.

Fig. 6.

Fig. 6

Structure of the data (for illustrative purposes, we assume the data starts at 9 AM)

Each observation represents the count of articles sold in the focal category in a given store during the corresponding interval. We aggregate the products at the category level to ensure sufficient data variation. However, we highlight that the model performance is independent of the aggregation level (i.e., our model can be applied at the product or sub-category level as long as the underlying data represents a large enough sample with sufficient variation). Since the dataset does not have a label (i.e., there is no label that indicates whether an observation is anomalous), we adopt ML methods designed to detect anomalous events in an unsupervised fashion.

To ensure that our results do not overly depend on a single approach, we consider several anomaly detection methods for the first stage. We ultimately select two methods that detect anomalies using different approaches. The first method is a time-series-based anomaly detection method called Donut (Xu et al. 2018b). It uses state-of-the-art deep learning variational autoencoders to suppress time-series components (trend, seasonality, and noise) and detect anomalies in an unsupervised manner. The second method is a tree-based model-free anomaly detection algorithm called the extended isolation forest (EIF) (Hariri et al. 2021). The core operation of the EIF is to identify observations that differ from the rest of the data. For more details on these two approaches, we refer the reader to the aforementioned papers.5

We train the Donut model using the sales data from 2018 and 2019. Meanwhile, the EIF model is trained on the 2019 data.6 We then apply both models to the sales data in the first 4 months of 2020 (recall that panic-buying behavior in North America occurred mainly in March and April 2020). The output of both models is an anomaly score for each observation. Observations with an anomaly score higher than a certain threshold in at least one model (since Donut and EIF are complementary approaches) are considered anomalies, which become inputs for the second-stage model. We treat the threshold in both models as hyperparameters (i.e., parameters whose values are chosen before training the algorithm and used to control the learning process) that are optimally selected via a cross-validation procedure to maximize the performance of our second-stage model.

Second-stage model: pertinent anomaly classification

The output of the first stage consists of a list of anomalies detected from a purely statistical perspective. However, even though these anomalies are significantly different from “normal” patterns, not all of them lead to adverse consequences from a business perspective. In our second stage, we label these anomalies as pertinent and non-pertinent based on managerial expertise. Subsequently, we use a supervised learning model trained on input features associated with the labeled anomalies to detect pertinent future anomalies.

In collaboration with our retail partner, we defined three types of consequences associated with pertinent anomalies in the context of panic buying: (i) anomalies that trigger a stockout of at least one product in the category within the next 3 days, (ii) anomalies that lead to subsequent anomalies in the same category and store (i.e., at least 10% of observations in the next 3 hh in the focal store are identified as anomalies), and (iii) anomalies that lead to a spread of anomalies of the same category in other stores (i.e., at least ten other stores experience more than one anomaly in the following 3 h). We also vary the time length of the different definitions to provide an additional layer of flexibility. An anomaly is formally defined as an observation with a rare pattern that deviates significantly from the majority of the data. In the context of panic buying, an anomaly translates into a very high, sudden spike in demand. As discussed, we further classify the anomalies into three pertinence types, depending on their consequences: triggering a stockout, leading to subsequent anomalies in the same store, and leading to subsequent anomalies in other stores. Finally, these business requirements used to define the pertinence of anomalies are fully flexible so that practitioners or researchers who adopt our model can rely on our framework and label pertinent anomalies differently based on their specific requirements.

Similar to the first-stage process, we consider several classification models. We finally opt for the gradient-boosting decision tree (GBDT) (Friedman 2001) as our main method for this stage based on its superior performance. Specifically, we adopt an efficient implementation framework of the GBDT called LightGBM (Ke et al. 2017). Four groups of input features associated with basket statistics, prices and promotions, time components, and anomalies are considered in this model, with a total of twenty-nine input variables. Since anomalies can be detected concurrently across multiple stores, the second-stage model is trained using input features obtained from all stores in the same region. To this end, we applied a z-score normalization to the sales data (i.e., quantity and number of transactions) of each store prior to computing the input features.7 The complete list of predictors used in our second-stage model is reported in Table 15 in Appendix F.

Table 15.

List of features from internal data sources

Feature name Description
basket size sum Number of articles in the window
basket assortment sum Number of article assortments in the window
basket size avg Average of articles in the window
basket assortment avg Average of article assortments in the window
basket size cat sum Number of category articles in the window
basket size cat avg Average of category articles in the window
basket assortment cat avg Average of category article assortments in the window
market share Category sales over all articles in the window
agg sales Number of category articles in the window
weighted price Weighted price of the category articles (weighted by quantity)
assortment size Number of category assortments in the window
promo assortment Number of category promotion assortments in the window
promo sales Number of category promotions in the window
store n lanes Number of lanes opened in the window
basket count Number of baskets in the window (frequency/traffic)
day Day of the week
month Month of the year
week Week of the year
hour Hour of the window
donut cluster Donut cluster result
donut score Donut test-score result
donut score scaled Donut test-score scaled result (between 0 and 1)
eif cluster EIF cluster result
eif score EIF cluster score
nb past sales spike Number of past sales spike flag in other stores
nb past stockouts Number of past stockouts flag out
holidays flag Holiday indicator

We then split the input data into a 70% training set and a 30% test set. The GBDT model is trained using the training set, and its performance is evaluated on the test set. All variables related to external data are lagged by one time period to address a potential reverse causality issue. Ultimately, this model predicts for each product category and each 3-h window whether an anomaly is pertinent (and provides a likelihood score).

Anomaly detection results

Detected anomalies

We first present a visualization of the anomalies detected by our first-stage model in Fig. 7. Each dot represents the (normalized) aggregate sales over the category during a 3-h interval. The colors of the dots are only for the sake of visualization and are not used in any of our analyses: gray corresponds to no anomaly, yellow to a low score, orange to a medium score, and red to a high score. For conciseness, we focus on the toilet paper category from the largest store in our dataset (the same plots for the two other product categories can be found in Appendix A.3). Recall that each observation in our data is an aggregated sales volume of all the products in a category during a 3-h time window. The x-axis represents the time, and the y-axis represents the (normalized) volume of sales. Intuitively, observations with sales higher than a certain threshold are identified as anomalies, and the higher the sales, the higher the anomaly score. To allow easy visualization, we assign four colors to the anomalies, depending on their severity.

Fig. 7.

Fig. 7

Anomalies detected in the toilet paper category in one store

Our second-stage model classifies the anomalies detected in the first stage as pertinent or non-pertinent. In Fig. 8, we plot the non-pertinent or false alarms (in black) and the pertinent alarms (in other colors) for the toilet paper category. The non-pertinent observations are detected as anomalies from a statistical perspective but did not yield business consequences as per our definitions. As we can see, our second-stage model can successfully classify anomalies as pertinent, allowing us to remove the non-pertinent anomalies, thus making the detection process more suitable from a managerial perspective. As mentioned before, our method offers the flexibility to define the concept of pertinence, depending on the context under consideration.

Fig. 8.

Fig. 8

Anomalies in the toilet paper category classified as pertinent (colored) and non-pertinent (black)

To summarize, Table 1 reports the total number of anomalies detected by each algorithm (Donut and EIF) in the first stage, as well as the total (i.e., the union of both sets). It also reports how many of these anomalies are eventually classified as pertinent in the second stage. Interestingly, Donut detects a significantly larger number of anomalies than EIF. At the same time, EIF is more effective than Donut since the proportion of pertinent anomalies in the second stage is much higher. This detail illustrates the complementarity of the two approaches. Note that the total number of pertinent anomalies reported in the last column is not the sum of the total number of pertinent anomalies reported based on Donut and EIF in the two preceding columns because the same pertinent anomaly could be detected by both Donut and EIF in the first stage.

Table 1.

Total number of detected and pertinent anomalies

Detected anomalies Total
detected anomalies
Pertinent anomalies Total
pertinent anomalies
Donut EIF Donut EIF
Toilet paper 7,790 2,719 8,250 677 1,756 2,200
Canned soup 10,895 2,021 11,405 1,291 372 1,365
Household cleaners 6,267 1,349 6,350 356 412 701

In addition, recall that anomalies detected in the first stage are considered pertinent if they fit any of the three definitions of pertinence presented in Sect. 4.2. Table 2 reports the percentage and number of anomalies that are labeled as pertinent according to each of our business definitions in the second stage. Note that household cleaners exhibit no anomalies that signal a spread to other stores but include several anomalies that trigger subsequent anomalies in the same store, as well as anomalies that lead to future stockouts.

Table 2.

Total percentage and number of pertinent anomalies

Percentage of pertinent anomalies
Triggering anomalies Spreading anomalies Stockout anomalies
Toilet paper 73% (1,606) 56% (1,232) 90% (1,980)
Canned soup 73% (997) 60% (822) 15% (206)
Household cleaners 82% (574) 0% (0) 42% (294)

As is common in the ML literature (e.g., see Larose 2015), we use three complementary metrics: precision (i.e., the percentage of results that are relevant), recall (i.e., the percentage of total relevant results correctly classified by the algorithm), and F1 score (i.e., the measure of a model’s accuracy defined as the harmonic mean of precision and recall) to measure the performance of the classification algorithms. Note that the accuracy measure would not be suitable in this case given the skewness of the problem (anomalies are very rare events with less than 1% occurrence over historical data) since all the models would achieve more than 99% accuracy based on this measure. We report the overall performance of our second-stage model in Table 3. This table includes the precision, recall, and F1 score of the anomaly detection outcomes for the three product categories (toilet paper, canned soup, and household cleaners) with respect to the different definitions of pertinence. Overall, our model performs well for toilet paper across all three types of pertinent anomalies. It is worth noting that we do not report the classification of spreading anomalies for household cleaners since there is no pertinent anomaly based on this definition for this product category. Finally, we report the days in which we identify at least one pertinent anomaly in Table 4. As expected, the pertinent anomalies are mainly concentrated in March 2020. This finding is a strong validation that our approach successfully detects anomalies for the right period, whereas virtually no anomalies are detected during periods where we do not expect to see them (since no panic buying was observed). These results support the validity and correctness of our approach and provide us with a strong sanity check.

Table 3.

Summary of model performance

Triggering anomalies Spreading anomalies Stockout anomalies
Precision Recall F1 score Precision Recall F1 score Precision Recall F1 score
Toilet paper 81% 81% 81% 93% 93% 93% 77% 74% 75%
Canned soup 83% 53% 68% 89% 80% 84% 71% 52% 60%
Household cleaners 86% 68% 76% N/A (no pertinent anomalies) 70% 58% 63%
Table 4.

Number of days in which pertinent anomalies are detected in 2020

January February March April
Toilet paper 0 0 6 1
Canned soup 0 0 22 1
Household cleaners 0 0 25 7

The role of external data

We next investigate the impact of incorporating external data sources (Google Trends, Twitter, and news data) into our second-stage model. In other words, we investigate the improvement in classification performance for models that incorporate these external data sources relative to a baseline model that uses only internal data. Here, we report our model performance in terms of classifying pertinent anomalies for different combinations: using only internal data, using internal data with each type of external data, and using internal data with all types of external data. For brevity, we label an observation as a pertinent anomaly if it satisfies any of our three definitions. Consistent with the prior literature, we find that adding external data helps enhance the performance of our classification model. The results are reported in Tables 5 and 6. Observe that the F1 score for the model that incorporates all external data sources in the second-stage classification is consistently above 90%.

Table 5.

Model performance with respect to external data (part 1)

Internal only Internal + Google trend Internal + News articles
Precision Recall F1 score Precision Recall F1 score Precision Recall F1 score
Toilet paper 91% 85% 88% 95% 95% 95% 95% 95% 95%
Canned soup 65% 55% 59% 73% 55% 63% 85% 55% 67%
Household cleaners 84% 65% 73% 89% 69% 78% 84% 65% 73%
Table 6.

Model performance with respect to external data (part 2)

Internal + Twitter Internal + All
Precision Recall F1 score Precision Recall F1 score
Toilet paper 95% 95% 95% 97% 95% 96%
Canned soup 85% 55% 67% 94% 91% 93%
Household cleaners 83% 69% 75% 96% 89% 92%

To clearly establish the influence of each external data source in the second-stage classification task, we compute the Shapley value of each predictor used in the classification model. The notion of the Shapley value was originally developed as a solution concept in the cooperative game theory literature. The value represents the average expected marginal contribution of each player in the game to the payoff function (Shapley 1953). In recent years, it has been widely used to explain the influence of the different predictors in ML models (e.g., Husain et al. 2016; Ma and Tourani 2020; Pamuru et al. 2022). The choice of this model-interpretability metric (rather than using the pre-built function in LightGBM for variable importance) is due to the requirement of the retail partner to make the interpretability function model agnostic in case they change the predictive model in the ML pipeline in the future. In the context of this paper, the Shapley value captures the average marginal contribution of each predictor to the eventual outcome (i.e., whether the anomaly is pertinent). To this end, we use the SHAP library (Lundberg and Lee 2017), which relies on scalable additive feature attribution methods, to compute the Shapley values. The resulting mean absolute Shapley values can then be used as a measure of global feature importance based on the magnitude of feature attributions.

Figure 9 shows the feature importance derived from the Shapley values of the top ten predictors (in terms of influence on the classification task) for the toilet paper category (a higher value indicates a higher influence on the classification). As we can see, five out of the top ten features come from external data sources. This finding explains the superior performance of the model that uses external data sources relative to the model that uses only internal data, as reported in Tables 5 and 6. In addition, temporal features (hour of the day and day of the week) also seem to play an important role. The feature importance plots based on the Shapley values of the top ten predictors for the canned soup and household cleaners categories are available in Appendix A.4.

Fig. 9.

Fig. 9

Feature importance (Shapley vales) of top ten predictors in the second-stage classification for the toilet paper category

Alternative specifications

We next consider several alternatives to the specification of our two-stage model and report several comparison results.

Varying the definition of triggering anomalies

Recall that we defined an anomaly as pertinent if it triggers stockouts within the next 3 days. In Fig. 10, we vary the length of this definition. Particularly, we consider 1 day and 7 days as alternative values. As expected, when the length increases, the percentage of detected pertinent anomalies also increases—in a concave fashion. The fact that our model offers the flexibility to work under various lengths allows us to adapt the definition of anomaly depending on the business requirements.

Fig. 10.

Fig. 10

Analysis of pertinent anomalies related to stockouts in the same store for different time lengths: 1 day, 3 days, and 7 days

Using single-stage anomaly detection models

Recall that our anomaly detection model relies on two stages: detecting anomalies (first stage) and classifying them as pertinent (second stage). The first stage relies on the time series of the sales, whereas the second stage uses several features (e.g., promotions and external data) to perform the classification task. An alternative approach would be to consider a single-stage anomaly detection model that utilizes all the features at once. We next compare the performance of our two-stage model to several single-stage anomaly detection methods (vanilla model, EIF multivariate, and Donut multivariate). The vanilla model refers to a simple statistical method whereby anomalies are defined as observations with a value higher (or lower) than two standard deviations from the mean of the sales in the training dataset. The EIF and Donut multivariate models are extensions of the (unsupervised learning) anomaly detection models considered in Sect. 4.1, where all the features are included in the anomaly detection phase (i.e., in the first stage).

The results for all three product categories are reported in Table 7. Since our retail partner is particularly interested in how effective the methods are in detecting pertinent anomalies in each specific store-day pair, we evaluate the methods using two key measures that are reported in each cell of Table 7. The second number (on the right) corresponds to the number of store anomalous days (SAD) that are detected by each method. The first number (on the left) corresponds to the number of store anomalous days with actual pertinent anomalies (SADPA) among the SAD. The ratio SADPA/SAD is then equivalent to precision at the store-day level. This number is important because the retailer needs to review and react based on the detected anomalies. Thus, incorrectly detected anomalies (i.e., false positives) can significantly affect the trust of the users of this tool. As we can observe in Table 7, our two-stage model can effectively rule out the majority of anomalies detected by the first stage that are likely to be non-pertinent. Table 7 showcases three important benefits of our approach: (i) our model detects a much larger number of actual pertinent anomalies relative to the three single-stage benchmarks; (ii) our model has a much lower number of false alarms (e.g., during the peak period in March and April 2020, the false positive rate of the two-stage model over the three categories is approximately 8.8% versus 54.4% when using the vanilla model); and (iii) our model detects anomalies when it should (March and April) but not when none exist (January and February). These results clearly convey the need to consider a two-stage approach in which the various features are leveraged for classification purposes in the second stage, especially in the empirical context of this study. More importantly, without the second stage, the retailer would need to deal with an enormous number of detected anomalies that often turn out to be false alarms. Meanwhile, although using traditional single-stage models may be appealing since they are easier to employ and maintain, such an approach can lead to detecting a large number of non-pertinent anomalies that can be costly to the company’s operations. However, these models tend to fail to detect many pertinent anomalies.

Table 7.

Prediction comparison of different benchmarks for 42 stores

SADPA/SAD
Jan 2020
SADPA/SAD
Feb 2020
SADPA/SAD
Mar 2020
SADPA/SAD
Apr 2020
Toilet paper First-stage vanilla model 0/619 0/495 33/419 4/419
First-stage IEF multivariate 0/88 0/15 112/144 16/75
First-stage Donut multivariate 0/35 0/34 155/439 26/84
Our two-stage model 0/0 0/0 252/271 42/46
Canned soup First-stage vanilla model 0/420 0/138 448/873 29/331
First-stage IEF multivariate 0/214 0/312 240/1040 21/202
First-stage Donut multivariate 0/38 0/210 733/880 38/389
Our two-stage model 0/0 0/55 918/1073 38/42
Household cleaners First-stage vanilla only 0/168 0/520 1063/1211 280/822
First-stage IEF multivariate 0/14 0/670 870/870 268/675
First-stage Donut multivariate 0/20 0/401 912/984 294/560
Our two-stage model 0/0 0/0 1004/1004 294/360

So far, our focus has been on the predictive side by developing a model that can successfully detect retail demand anomalies according to different definitions of pertinence. The next question is how retailers can leverage the insights generated by our model to improve their operational decisions. We investigate this question by developing a prescriptive simulation tool in the next section.

Prescriptive retail operations amid pandemics

Once pertinent anomalies are detected, the retailer needs to react quickly and strategically. In particular, the demand planner can analyze the impact of panic buying on the store inventory to prescribe the necessary actions. More specifically, our model will send a signal indicating potentially pertinent anomalies that may affect the operations of the retailer; then the retailer can strategically react by deploying two potential countermeasures. First, the retailer may impose a quantity limit per customer for a group of similar products, typically at the category or sub-category level. This type of rationing policy was widely adopted by several retailers in the first quarter of 2020 and in 2021.8 Ultimately, such a policy aims to ensure that a higher number of customers can purchase essential products that are currently in high demand. The simulation model presented in this section can be used to decide when to activate such a rationing policy, for which categories of products, and which limit value to set. To our knowledge, no such data-driven tool has yet been developed. Instead, retailers have heretofore decided when and how to use rationing policies based on intuitions and emotional reactions. Consequently, there is no clear notion of current practice against which to benchmark our results since there has been no systematic approach applied to solving this problem until now with our proposed approach. The second measure is the arrangement of an urgent shipment to fulfill additional inventory. If necessary, the demand planner can make a special order request if it is foreseen that certain products will be out-of-stock prior to the next scheduled delivery. Such an intervention, however, is costly and mainly used in exceptional situations when a substantial stockout is anticipated. In this section, we focus on the first measure (rationing policy), but one can easily combine our simulation tool with the second measure (urgent shipment to fulfill additional inventory), as discussed in Sect. 5.3. We next elaborate on our prescriptive simulation tool and apply it to a simulated future panic-buying event to showcase its practical impact.

Prescriptive tool for rationing policies

We conduct a scenario analysis at the product-group level in which a limit c is imposed on the total number of products purchased per customer. For example, the store manager may decide to allow each customer to purchase no more than c=2 packs of toilet paper of any brand. The product groups on which such a limit can be imposed are predetermined by the retailer and typically represent a set of products that are relatively homogeneous and substitutable for each other. Our prescriptive tool can be used to decide (i) when to activate such a rationing policy, (ii) for which group of products, and (iii) the best limit value (i.e., the value of c). Taken together, we develop a simulation tool that allows retailers to proactively test “what-if” scenarios in response to the detected demand anomalies due to panic buying.

The total demand for each group of products g is estimated by a decomposition model comprising two modeling features: (i) the estimated arrival rate of customers for product group g, and (ii) the distribution of the number of units (basket distribution) from this product group purchased by a customer. The first input can be estimated from the observed sales during the periods with pertinent anomalies, which are given by the classification model from Sect. 4.2, adjusted by intra-day and intra-week seasonality factors (we omit the details for conciseness). The second input is the empirical distribution of the number of purchased units per customer in group g obtained from the sales transaction data during the same period. The expected total demand for products in group g under limit c during the time interval (t,t), where t>t is any subsequent time period, can then be calculated as follows:

EpD~g,ct,t=s=tt+Lλgsk=1min{kg,c}Pk~g=k.

Here, λgs is the estimated arrival rate of customers for products in group g at time s,k~ is a random variable representing the number of units of products in group g purchased by a customer, and k~g=k is the probability mass function of k. The demand planner can then use this simulation analysis tool to estimate the total group demand over time based on different limits to select the appropriate value of c for each product group and time period.

We next showcase the impact of our simulation tool applied to the panic-buying event that occurred in March 2020. Specifically, we use the historical data from the toilet paper category prior to March 12, 2020 to calibrate our model and then simulate the results for the subsequent week. Our findings suggest that the substantial stockout in the toilet paper category for March 12–19, 2020 (i.e., the crux of the panic-buying period in North America) could have been avoided by imposing a limit of one item per customer starting from March 12. In addition, imposing a limit of one item per customer in the toilet paper category for March 12–19 in a representative store would have increased the average number of unique customers per day from 675 to 1,058—corresponding to a 56.74% increase in access to essential products.9 This and similar policies, if performed properly, can democratize access to a wider population without incurring any cost to the retailer.

We note that the above simulation exercise is based on historical data and may not accurately reflect the actual impact. To address this shortcoming, we next present a simulation test based on a future panic-buying event.

Simulation of future panic-buying waves

We simulate several relevant scenarios to capture a typical wave of panic buying. An event of panic buying is characterized by the following three features:

  1. Panic-buying wave strength: In a panic-buying event, the demand for a specific category of products (e.g., toilet paper) will substantially increase. We, thus, consider three strength levels: half, full, and double. We use data from the panic-buying wave triggered by the COVID-19 pandemic in March 2020 to calibrate the full strength. We then consider a milder version (half strength) and a stronger version (double strength).

  2. Basket distribution: Using historical data, retailers can calculate the distribution of the number of units for each product group. This distribution can be time dependent (e.g., different for each day of the week and month of the year). In a panic-buying event, the basket distribution will be greatly shifted. In our simulation, we consider the following basket distribution: (0.1, 0.1, 0.15, 0.15, 0.2, 0.3), where the i-th number corresponds to the proportion of customers who purchase i units (i = 1,…, 6). In this distribution, 30% of the customers will buy six products.

  3. Duration of the panic-buying wave: Another important parameter is the duration of the panic-buying wave. In our simulation, we consider a 7-day event and a 14-day event.

In addition, we need to consider the levers available to the retailer. First, the retailer needs to decide on the value of the rationing limit c. Motivated by practical considerations, we consider c = 1, 2, 3, as well as the scenario with no rationing limit. A second important parameter is the amount of initial inventory at the beginning of the panic-buying wave. In our simulation, we consider two values (simple and double). Specifically, we use the same value as in the March 2020 panic-buying event in the toilet paper category. We also consider a scenario in which the amount of initial inventory is twice as high.

To determine the projected inventory level of the category under different rationing limits, the (random) inventory position I~i,c(t) of product i at time t under quantity limit c can be computed as I~i,ct=Iit-D~i,c(t,t), where Iit is the known inventory position of product i at the current time t, and D~i,c(t,t) denotes the total random demand for product i during the time interval (t,t) under limit c. Practically, we can approximate the expected inventory level of product i under limit c at time t by

EPI~i,ctIit-EPD~i,ct,t,

where EPD~i,ct,t denotes the estimated demand for product i during the time interval (t,t) under limit c. The results of our simulations are reported in Tables 8 and 9 for durations of 7 and 14 days, respectively. As discussed, we consider three different levels of panic-buying wave strength and two levels of initial inventory. We first report the number of “inventory coverage days,” that is, the number of days before reaching the situation where all the items in the category are out-of-stock. The higher this number, the better (“7 + ” means that the inventory lasts for at least a week, which is the duration of the panic-buying wave). We also report the number of baskets served under each scenario. As expected, imposing a stricter rationing limit (i.e., a smaller value of c) increases both the inventory coverage days and the number of baskets served. Overall, our tool allows retailers to test “what-if” scenarios and better understand the various trade-offs at play. For example, our tool allows the retailer to identify situations in which imposing a value of c = 2 is sufficient and where imposing a lower value will have no additional benefit. Ultimately, the tool allows retailers to make informed decisions regarding rationing policies in a data-driven fashion.

Table 8.

Stochastic simulation results for a future panic-buying event of 7 days

Panic-buying
wave strength (expected values)
Limit per customer Simple initial inventory Double initial inventory
# of inventory
coverage days
# of baskets
served
# of inventory
coverage days
# of baskets
served
Half strength (× 0.5) 1-item limit 7 +  1,766.70 (+ 47%) 7 +  1,769.44
2-item limit 7 +  1,770.19 (+ 47%) 7 +  1,766.36
3-item limit 6.76 1,713.39 (+ 43%) 7 +  1,766.25
No limit 4.03 1,200.88 7 +  1,766.15
Full strength (× 1) 1-item limit 7 +  3,527.49 (+ 194%) 7 +  3,530.65 (+ 47%)
2-item limit 4.61 2,622.66 (+ 118%) 7 +  3,533.30 (+ 47%)
3-item limit 2.80 1,845.60 (54%) 6.75 3,428.76 (+ 43%)
No limit 1.63 1,200.97 4.02 2,401.36
Double strength (× 2) 1-item limit 4.26 4,983 (+ 315%) 7 +  7,035.47 (+ 193%)
2-item limit 1.81 2,622.72 (+ 118%) 4.61 5,245.12 (+ 118%)
3-item limit 1.18 1,845.50 (+ 54%) 2.80 3,691.06 (+ 54%)
No limit 0.74 1,200.86 1.62 2,401.39

Table 9.

Stochastic simulation results for a future panic-buying event of 14 days

Panic-buying
wave strength (expected values)
Limit per customer Simple initial inventory Double initial inventory
# of inventory
coverage days
# of baskets
served
# of inventory
coverage days
# of baskets
served
Half strength (× 0.5) 1-item limit 14 +  2,722.41 (+ 127%) 14 +  2,718.36 (+ 14%)
2-item limit 12.67 2,528.60 (+ 111%) 14 +  2,715.85 (+ 14%)
3-item limit 7.78 1,845.47 (+ 54%) 14 +  2,720.54 (+ 14%)
No limit 4.04 1,200.89 11.53 2,378.90
Full strength (× 1) 1-item limit 12.03 4,890.10 (+ 307%) 14 +  5,430.42 (+ 126%)
2-item limit 4.61 2,622.61 (118%) 12.65 5,063.12 (+ 111%)
3-item limit 2.78 1,845.59 (+ 54%) 7.79 3,691.03 (+ 54%)
No limit 1.63 1,200.98 4.04 2,401.72
Double strength (× 2) 1-item limit 4.25 4,983 (+ 315%) 12.06 9,779.28 (+ 307%)
2-item limit 1.81 2,622.79 (+ 118%) 4.60 5,245.27 (+ 118%)
3-item limit 1.18 1,845.60 (+ 54%) 2.81 3,691.10 (+ 54%)
No limit 0.74 1,200.67 1.63 2,401.51

In our simulation, we assume that the consumer demand would not change should the retailer implement a rationing policy. Indeed, in the context of this study, it is highly unlikely that consumer demand would be elastic to the retailer’s rationing policy. In other words, since the products in our study are subject to panic buying, it is unlikely that the demand would be lower (or higher) as a result of the rationing policy implemented by the retailer. Consequently, we can safely assume that the demand values are independent of the model parameters. Our prescriptive simulation took advantage of this distinctive feature.

We simulate panic-buying behavior using actual inventory and basket distributions during the panic buying of March 2020. For each simulation scenario, we generate basket distributions according to a predefined probability distribution of the number of items in a basket for each category. The number of baskets generated represents the strength of the panic-buying wave; that is, the stronger the panic-buying wave, the larger the number of baskets (demand). We consider three wave-strength configurations: half strength, full strength, and double strength. The full-strength configuration means that the simulation mimics the actual demand recorded during the panic buying of March 2020. We simulate the half-strength and double-strength waves by halving (i.e., 0.5 × March 2020 demand) and doubling (i.e., 2 × March 2020 demand) the full-strength wave, respectively. For each configuration, we generate 10,000 samples and compute the expected values, which are reported in Tables 8 and 9. More specifically, the data points are uniformly distributed between 0.4 and 0.6 for the half-strength simulation, between 0.8 and 1.2 for the full-strength simulation, and between 1.6 and 2.4 for the double-strength simulation (each simulation includes a ± 20% range around the nominal wave strength). The initial inventory value considered for the simulation is the actual inventory value as of March 12, 2020, and we also consider a scenario in which the initial inventory is doubled. Finally, the different scenarios represent the rationing policies implemented where the number of items per basket is capped at 1, 2, or 3, or no limit is set. We report the number of baskets served during a period of 7 or 14 days, as well as the number of days before running out-of-stock. As observed, by imposing a rationing policy informed by our proposed model, the retailer can extend the number of inventory coverage days and increase the number of baskets served, both of which have direct implications for consumer surplus and social welfare since the product will more likely be available on the shelf.

Extension: urgent shipment to fulfill additional inventory

One can extend our simulation tool to prescribing urgent inventory shipments. To do so, the output of the scenario analysis is used to produce an inventory estimate for the near future under a specific quantity limit for each item in the category to determine if an urgent shipment is necessary. This item-level expected demand estimate can be expressed as EPD~i,ct,t=s=ttλgsk=1min{k~g,c}Pk~g=kϕis, where ϕis is the estimated proportion of the demand of product i in group g in period s. For example, we can determine ϕis using the commonly employed multinomial logit function based on the sales in the same group of products. More specifically, we denote by mi the total sales of product i recently observed during periods with pertinent anomalies and by Ag,c(t), the list of available products in group g at time t under limit c. We then have ϕit=mijAg,ctmj. Note that this reallocation of category demand based on the multinomial logit function essentially captures the fact that customers are willing to substitute an available product for another available product, which can be justified in a panic-buying event for certain categories of products.

We consider the inventory level of a product to be critical when the expected inventory level of the product EPI~i,ct reaches a predefined critical level αi chosen by the retailer. In practice, this critical inventory level can be directly derived from the approximate total demand during the replenishment lead time. If the approximate on-hand inventory is projected to reach this critical level prior to the original replenishment date, then the demand planner can consider placing a special request through the fulfillment system to replenish the product. The recommended minimum replenishment quantity Qi,c of product i under limit c must be sufficiently high to ensure that the inventory can satisfy the estimated demand until the next replenishment period R, that is, Qi,c=argminqEPI~i,cR+q0.

The above extension to making urgent inventory decisions shows how our proposed anomaly detection model can be leveraged to strategically adapt future inventory decisions to mitigate the occurrence of stockouts and, ultimately, serve a larger number of customers.

In summary, our tool’s first mission was to successfully detect pertinent anomalies in the context of panic buying. Such anomalies may have adverse future consequences for the retailer. It is then the retailer’s decision how to respond: either by imposing an informed rationing policy or by placing an urgent inventory order (or both). Our simulation tool can guide retailers in making such critical decisions.

Implications and conclusion

In this section, we first present the practical implications of our AI-based framework. We then discuss how both the predictive and prescriptive tools developed in this paper can be adapted and used by a wide range of retailers. We close by reporting our conclusions.

Practical implications

As discussed, this work was directly motivated by practical considerations. In particular, the panic-buying behavior amid the COVID-19 pandemic caught most retailers off guard as they were not ready for such unprecedented panic-buying waves. In this paper, we propose an AI-based framework to detect early signals of panic-buying events, which are cast as anomalies. We then propose a prescriptive tool that can react to mitigate the adverse impact of panic-buying events. Overall, our framework includes predictive (detecting different types of anomalies by leveraging data) and prescriptive (setting data-driven rationing policies) components.

As discussed in Sect. 5, our framework was implemented across 42 stores run by our retail partner. This retail partner has made the decision to run our tool at specific stores with certain product categories to proactively detect future demand anomalies. In the event that an anomaly is detected, the tool will automatically send an alert to the appropriate store manager. The alert will include the specifics of the detected anomaly (which store, which category of products, the type of anomaly, and its severity score, along with interpretability measures). In addition, at the end of each day, a formal report will be sent to the store managers regarding the specifics of the detected anomalies. It will then be up to the store managers to decide whether to take any prescriptive action to preempt the impact of the detected anomalies (e.g., a rationing policy or increasing inventory orders). Interestingly, each store manager will have the flexibility to define the types of anomalies they would like to flag and detect. Finally, in the event of severe panic buying, our prescriptive tool will assist the retailer in setting the right rationing policy for the right set of products at the right time. Ultimately, our framework can be seen as preventive and can significantly help retailers during challenging and unexpected times. Our method can increase access to essential products and mitigate prolonged stockouts, which are detrimental to a store’s reputation.

Generalizability

Despite the first COVID-19 panic-buying wave being behind us, retailers still often experience unusual demand spikes due to shifts in customer behavior, which often lead to stockouts (Gopal 2021). To mitigate this issue, the predictive and prescriptive analytics approaches presented in this paper can be readily adapted by retailers in different verticals. Specifically, any convenience store or supermarket, regardless of its size and geographical location, can leverage transaction and basket data, in conjunction with labels indicating pertinent anomalies, to train our anomaly detection and classification models. The anomalies can be labeled using historical sales and inventory data by focusing on different business requirements, such as service-level failures (e.g., stockouts) and abnormalities in sales (e.g., sales spikes across multiple locations). The trained model can then be deployed either at the store or regional level to detect and classify anomalies. Store managers can then be promptly notified once pertinent anomalies have been detected, perform prescriptive “what-if” analyses, and make informed decisions on rationing policies and fulfillment actions.

Other types of retailers, such as pharmacies, have also suffered from panic buying, leading to shortages of supplies of many medication products, including cold and flu tablets, painkillers, and various prescription drugs. Rationing policies have also been put in place to control such situations (Tillett 2020). Similarly, electronics stores have experienced significant demand surges in multiple product categories since 2020 due to changes in customer behavior (Leswing 2021). To address these issues, pharmacies and electronics retailers can apply our two-step anomaly detection method to proactively detect subtle demand anomalies for any group of products. The models can be readily generalized and enhanced through the use of data and additional features in such contexts. More specifically, pharmacies can further incorporate drug-consumption coverage in the anomaly classification model to enhance the prediction of pertinent anomalies. Electronics retailers can leverage product return data in the ML-based classification model in a similar fashion. The outputs from the anomaly detection model can then be used to analyze the impact of imposing different rationing policies, as presented in Sect. 5.1.

Finally, the framework proposed in this paper can also be applied to e-commerce. The unsupervised autoencoder model we used has been effectively applied to a large-scale web application to detect and analyze anomalies in web traffic every minute (Xu et al. 2018a). In addition to the main features presented in Appendix F, online retailers can further leverage customer traffic and clickstream data to enhance the performance of the classification model. Our tool can then be deployed in an automated manner to analyze online retail sales and provide recommendations on quantity rationing, product offerings, and inventory fulfillment in real time.

Conclusion

In this paper, we leveraged AI tools and methods to help retailers detect panic-buying events and improve essential product distribution in uncertain times. More specifically, we proposed an anomaly detection model that can identify pertinent anomalies in real time. Detecting anomalies early allows retailers to be proactive and systematically react to anomalies before it is too late. In particular, retailers can activate a rationing policy (i.e., limiting the number of purchased products per customer) or decide to place an urgent inventory order. These actions would enhance the distribution of essential products, thereby benefiting both retailers and their customers.

We proposed a two-stage AI model in which the first stage detects anomalies from a statistical perspective, and the second stage classifies these anomalies according to their managerial pertinence. Our framework is flexible and can easily be adapted to various business settings. Since we defined anomalies from a business perspective, our model provides a clear interpretation of the detected anomalies. By applying our method to three product categories—with more than 15 million observations—we first established that our model yields high performance and can be scaled to a large number of categories and stores. In actual implementation, our anomaly detection model can run in the background and send scheduled emails (e.g., hourly, daily, weekly) to the demand planner on the detected anomalies for each store and category. These emails can include the type of anomaly, its severity level, and an interpretation. We then conducted a simulation analysis to develop a prescriptive analytics tool. Ultimately, our tool allows retailers to test “what-if” scenarios to strategically react and properly decide when and how to activate rationing policies. Finally, we simulated a future panic-buying wave to showcase the practical impact of our tool.

Nevertheless, our research has limitations that offer highly promising avenues for future research. First, our model was developed to detect anomalies based on sales data and not on actual customer demand realizations. Given that observed sales are censored demand, it would be interesting to extend our method to uncensored sales data to truly detect demand anomalies. Second, our study was only concerned with potential strategic reactions to panic buying by retailers, whereas potential reactions from manufacturers are left as a potential future research direction. Lastly, while the counterfactual simulation results from our prescriptive analytics show that our proposed model positively impacts consumer surplus and social welfare by improving product availability, future research could aim to quantify this impact more formally.

In addition to its research impact, we believe that our method can yield a substantial societal impact. Specifically, it can help increase access to essential products during panic-buying waves. We have shown that a simple proactive strategy based on our results could have mitigated the overwhelming retail stockouts observed in March 2020 in North America and significantly increased access to essential products. This work demonstrates that by effectively leveraging suitable AI methods on large amounts of data, the retail world can be better for both firms and consumers.

Acknowledgements

The authors’ order is in alphabetical order. The third author is the main contributor. The authors would like to thank the retail partner, IVADO Labs, and SCALE AI that made this work possible. We also thank Gregg Gilbert, Michael Krause, and Mehdi Ait Younes for their insightful comments that helped improve this paper.

Appendix A: Additional plots

Weekly sales for canned soup and household cleaners categories

See Appendix Figs. 11, 12.

Fig. 11.

Fig. 11

Canned soup weekly sales from January to May 2020 across 42 stores (the first week in the x-axis corresponds to the week of January 5). The y-axis is normalized for anonymity

Fig. 12.

Fig. 12

Household cleaners weekly sales from January to May 2020 across 42 stores (the first week in the x-axis corresponds to the week of January 5). The y-axis is normalized for anonymity

Tweets and news article plots for canned soup and household cleaners categories

See Appendix Figs. 13, 14.

Fig. 13.

Fig. 13

Daily volume data of tweets and news article related to the canned soup category

Fig. 14.

Fig. 14

Daily volume data of tweets and news article related to the household cleaners category

Anomalies in canned soup and household cleaners categories

See Appendix Figs. 15, 16, 17, 18.

Fig. 15.

Fig. 15

Anomalies detected in the canned soup category in one store

Fig. 16.

Fig. 16

Anomalies in the canned soup category classified as pertinent (colored) and non-pertinent (black)

Fig. 17.

Fig. 17

Anomalies detected in the household cleaners category in one store

Fig. 18.

Fig. 18

Anomalies in household cleaners category classified as pertinent (colored) and non-pertinent (black)

Shapley values for canned soup and household cleaners categories

See Appendix Figs. 19, 20.

Fig. 19.

Fig. 19

Feature importance (Shapley values) of top ten predictors in the second-stage classification for the canned soup category

Fig. 20.

Fig. 20

Feature importance (Shapley values) of top ten predictors in the second-stage classification for the household cleaners category

Appendix B: Correlation table for external data

In Sect. 3 of the paper, we described the external data we have collected to supplement our model.

  1. First, we collect the data from Google Trend searches. In particular, we record the amount of searches for the term “COVID-19.” In addition, we record the amount of searches that are related to the product categories under consideration (i.e., we collect the daily search trend of the terms “toilet paper,” “household cleaners,” and “canned soup”). We limit the collection scope to searches within the metropolitan city in our dataset between January 1 and May 1, 2020.

  2. Second, we collect data from the social networking platform Twitter. Specifically, we collect the tweets that contain one of the common COVID-19 hashtags as defined in Lamsal (2020), and tweets that are related to the product categories under our consideration. In other words, we collect the tweets that either contain one of the common COVID-19 hashtags or mention toilet paper, household cleaners, or canned soup in the content or both. Similar to our Google Trend data collection, we limit the scope to tweets within the metropolitan city in our dataset between January 1 and May 1, 2020. Based on this collected data, we calculate the volume of the tweets as well as their sentiment.

  3. Third, we collect news article data from three local news outlets that operate in the same metropolitan city as our dataset. These three news outlets are the top three local news outlets based on the Google search rank. We then collect all the news articles that mention the term COVID-19, or mention one of the product categories under consideration, or both between January 1 and May 1, 2020. Based on this dataset, we calculate the volume of the news articles and their sentiment.

We report the Pearson correlation coefficients of all the variables for the toilet paper category in Table 10.Similar patterns were also observed for other product categories.

Table 10.

Correlation table for the toilet paper category

Trend Trend
COVID
News
sentiment
News
frequency
News COVID
sentiment
News COVID
frequency
Twitter
sentiment
Twitter
frequency
Twitter COVID
sentiment
Twitter COVID
frequency
Trend 1.0000 0.4798 0.0826 0.0275 0.0560 0.2984 0.3766 0.5719 0.0173 0.4251
Trend COVID 0.4798 1.0000 0.1390 0.1521 0.4220 0.8278 0.1919 0.3988 0.3852 0.8400

News

sentiment

0.0826 0.1390 1.0000 0.8496 0.1765 0.2773 -0.1728 -0.0775 0.2056 0.1253

News

frequency

0.0275 0.1521 0.8496 1.0000 0.1644 0.2682 -0.1203 -0.0719 0.2551 0.1250

News COVID

sentiment

0.0560 0.4220 0.1765 0.1644 1.0000 0.4509 0.0053 -0.0190 0.3819 0.3270

News COVID

frequency

0.2984 0.8278 0.2773 0.2682 0.4509 1.0000 0.0900 0.1970 0.4040 0.7371

Twitter

sentiment

0.3766 0.1919 -0.1728 -0.1203 0.0053 0.0900 1.0000 0.0642 0.0000 0.0573

Twitter

frequency

0.5719 0.3988 -0.0775 -0.0719 -0.0190 0.1970 0.0642 1.0000 0.0042 0.6532

Twitter COVID

sentiment

0.0173 0.3852 0.2056 0.2551 0.3819 0.4040 0.0000 0.0042 1.0000 0.3344

Twitter COVID

frequency

0.4251 0.8400 0.1253 0.1250 0.3270 0.7371 0.0573 0.6532 0.3344 1.0000

Appendix C: Overview of two-stage AI pipeline architecture

The architecture of the AI pipeline implemented on the cloud platform is shown in Fig. 21. The main data used in this pipeline comprise (i) transaction data, (ii) inventory and stockout records, (iii) external data collected from external sources (Google Trends, Twitter, and news). Historical data are processed and stored in the database, whereas live data are connected to the company’s data streams.

Fig. 21.

Fig. 21

Illustration of the AI pipeline architecture

As described in Sect. 4 of the paper, the predictive framework consists of the first-stage and second-stage models. The labeling process is implemented and connected to the ML flow platform.10 This platform is commonly used to manage the lifecycle of ML processes, in particular, the records of experiments including model parameters, code versions, metrics, and output files. We implemented the first-stage anomaly detection models using the EIF11 and Donut12 Python libraries, whereas the second-stage classification model was implemented using the LightGBM library.13 The trained ML models are stored in the MLflow pipeline and later used in the prediction on live data. Detected pertinent anomalies from this two-stage ML prediction will be saved, and pertinent anomaly reports will be generated accordingly. The user can also run various what-if analyses (as described in Sect. 5) to evaluate and determine appropriate rationing policies and fulfillment plans if necessary.

Appendix D: AI-based methodologies for anomaly detection and classification

See Appendix Fig. 21

Extended isolation forest

The extended isolation forest (EIF) algorithm (Hariri et al. 2021) is an extension of the isolation forest algorithm (Liu et al. 2008), which is an unsupervised learning algorithm developed for anomaly detection. This algorithm is different from traditional statistical anomaly detection algorithms since it does not directly rely on a probabilistic distribution of normal data points. Instead, this tree-based algorithm creates a set of trees (called forest) by partitioning the input data based on one random feature at a time using a similarity score (i.e., a distance measure) until all individual data points are isolated at the leaf nodes. With this trained model, one can compute the anomaly score of a new data point by running it through all the trees and measure the average anomaly score based on the depth from each tree (i.e., a lower depth implies that the data point can be easily isolated due to its deviation from the other data points). The scalability and effectiveness of this approach are demonstrated in Hariri et al. (2021).

A deep-learning-based anomaly detection algorithm: Donut

Donut Xu et al. (2018a) is a multivariate anomaly detection algorithm based on a deep learning architecture called variational autoencoder (VAE) (Kingma and Welling 2013), which can be used to train generative models to construct the distribution of the input data (training set). This autoencoder-based method is trained to compress (i.e., encode) the data input into a latent vector of lower dimensional space and then reconstruct (i.e., decode) the data input from the latent vector using an end-to-end deep-learning neural network model. The deep-learning-based Donut algorithm is particularly suitable when dealing with large-scale datasets (such as the transaction data in our case) with missing values. Since this algorithm makes use of kernel density estimation (KDE) in the reconstruction process, one can also obtain probabilistic insights and interpretability information (for more details, see Xu et al. (2018a)).

LightGBM

LightGBM is a decision-tree-based algorithm proposed by Ke et al. (2017) that has been widely adopted in various ML applications due to its scalability and performance (e.g., see Hancock and Khoshgoftaar 2021; Makridakis et al. 2021). This type of tree-based models is highly suitable for capturing complex non-linear relationships among different features. Unlike traditional tree-based ML algorithms, LightGBM creates a prediction tree by node splitting, allowing low memory usage and faster training (more details can be found in the library documentation). Thus, LightGBM is capable of dealing with large-scale and complex data. In addition, tree-based models are useful for practitioners due to their interpretability dimension (Ke et al. 2017). More particularly, one can directly obtain feature importance and plot prediction trees using the LightGBM library.

Appendix E: Step-by-step description of input feature preparation

See Appendix Tables 11, 12, 13, 14.

  1. Aggregate sales data: We first aggregate the transaction sales data by category into the total sales quantity of each 5-min interval. An example of the aggregate sales data is provided below.

  2. Input features for anomaly detection: The feature inputs of the ML-based anomaly detection models are created using the sliding window method (Sejnowski and Rosenberg 1987). To create input features, we use a window of 3 h that comprises aggregated observations of 36 data points each composed of a 5-min interval at the store level. More specifically, the first observation based on the above example comprises the sales quantities of each 5-min interval that has occurred between 11:00 and 14:00, the second observation corresponds to the sales quantities of each 5-min interval that has occurred between 11:05 and 14:05, and so on. We also conduct a small computational experiment to vary the time window and find no meaningful implications.

  3. Outputs of first-stage model: The feature inputs generated in Step 2 are used in the first-stage model, which indicate whether or not each observation in the input is considered as an anomaly as shown in the last column of Table 13. Since our framework leverages two ML-based anomaly detection models (EIF and Donut), an observation is considered anomalous if it is flagged as an anomaly by at least one of the anomaly detection models.

  4. Input features for pertinent anomaly classification: In this step, we only retain the anomalies flagged in the first-stage model (i.e., labeled as Y in Table 13). For each detected anomaly, we obtain the input features (i.e., predictors) for the second-stage model to classify pertinent anomalies using both internal and external data sources. As shown in Table 14, we create 29 input variables (as listed in Table 15) derived from internal data and 5 input variables derived from Google trends, Twitter, and news data as described in Sect. 3. Since we use the inputs from multiple stores in the same region in the pertinent anomaly classification model, the sales data (i.e., quantity and number of baskets) of each store are transformed using a z-score normalization. In other words, features 1–15 in Table 15 are computed using the normalized sales data from each store. In the last column, we create the label indicating whether the anomaly is pertinent by checking if one of the consequences described in Sect. 4.2 has occurred. This column is used as the label in the supervised ML model.

Table 11.

Example of aggregate sales data of each 5-min interval

Time interval 11:05 11:10 11:15 11:20 13:50 13:55 14:00 14:05 14:10 14:15 16:45 16:50 16:55 17:00 17:05
Sales quantity 2 1 3 2 3 2 4 0 2 5 3 2 4 0 2

Table 12.

Input features derived from the aggregate sales data

First-stage features 1 2 3 4 34 35 36
Observation 1 (11:00–14:00) 2 1 3 2 3 2 4
Observation 2 (11:05–14:05) 1 3 2 3 2 4 0
Observation 3 (11:10–14:10) 3 2 3 2 4 0 2
Observation 35 (13:55–16:55) 4 0 2 5 3 2 4
Observation 36 (14:00–17:00) 0 2 5 3 2 4 0
Observation 37 (14:05–17:05) 2 5 3 2 4 0 2

Table 13.

Output of the first-stage anomaly detection model (Y is for Yes and N for no)

First-stage features 1 2 3 4 ... 34 35 36 Anomaly flag
Observation 1 (11:00–14:00) 2 1 3 2 . 3 2 4 Y
Observation 2 (11:05–14:05) 1 3 2 . 3 2 4 0 N
Observation 3 (11:10–14:10) 3 2 . 3 2 4 0 2 Y
. . .
Observation 35 (13:55–16:55) 4 0 2 5 . 3 2 4 Y
Observation 36 (14:00–17:00) 0 2 5 . 3 2 4 0 N
Observation 37 (14:05–17:05) 2 5 . 3 2 4 0 2 N
. . .

Table 14.

Input features of the second-stage model (pertinent anomaly classification)

First-stage features 1 2 3 4 34 35 36 Anomaly flag
Observation 1 (11:00–14:00) 2 1 3 2 3 2 4 Y
Observation 2 (11:05–14:05) 1 3 2 3 2 4 0 N
Observation 3 (11:10–14:10) 3 2 3 2 4 0 2 Y
Observation 35 (13:55–16:55) 4 0 2 5 3 2 4 Y
Observation 36 (14:00–17:00) 0 2 5 3 2 4 0 N
Observation 37 (14:05–17:05) 2 5 3 2 4 0 2 N

Appendix F: List of features

See Appendix Table 15 .

Author contributions

Conceptualization: YA, MC, WK-a-n. Methodology: OB, AC. Investigation: YA, OB, AC, MC, WK-a-n. Visualization: YA, OB, AC, MC, WK-a-n. Writing: YA, MC, WK-a-n.

Funding

This work is financially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) grant number RGPIN-2021–02657. The first and fourth authors are part-time advisors to IVADO Labs, and the fifth author was a part-time advisor to the same organization when this research was completed. There are no competing interests to declare that are relevant to the content of this article.

Data availability

The data that support the findings of this study are supplied by our retail partner, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the retail partner.

Declarations

Conflict of interest

The authors declare no competing interests.

Footnotes

2

In addition to the three main categories reported in the paper, four other categories, namely Baking Ingredients, Water, Cheese, and Soap, were initially chosen by the retailer for this project. However, after preliminary analyses, the numbers of detected anomalies in these categories were smaller (less than 0.78%). As a result, the retailer decided to exclude these four categories from the study and focus on the three categories we presented.

3

There are of course additional sources of external data available. However, it is important to note that in the retail industry, it is often illegal to collect granular data from competitors and/or manufacturers and use these data in either predictive or prescriptive models. Meanwhile, it is possible to obtain aggregate-level data (e.g., at the geography, quarter, or industry levels) by purchasing such datasets from third-party providers. Nevertheless, our retail partner felt that the cost of acquiring, cleaning, and ingesting these data was not a priority given that such external datasets may not be granular enough to be meaningfully used in our models.

4

Several variables in our collected external data are naturally correlated. As shown in Appendix B, however, none of the correlation coefficients are extreme. Since we are focusing on a predictive task, we measure the usefulness of these variables by comparing our model performance before and after including these variables in the model, as shown in Tables 5 and 6.

5

We also considered several alternative approaches, including a statistical-based method and Prophet (Taylor and Letham 2018). Both these approaches yielded lower performance relative to our two primary methods.

6

We use a different training set for the EIF model for the sake of efficiency. However, the results remain qualitatively the same when we use the same training set for both models.

7

A z-score normalization refers to normalizing every value in a dataset such that the mean of all of the values is 0 and the standard deviation is 1.

9

Our simulation shows that when our model is used in conjunction with the rationing policy, we would be able to conduct 56.74% more transactions. Since the rationing policy limits the purchase of these in-demand products for each household, we can infer that our model would be able to distribute these products to 56.74% additional households without accessing consumer-level data.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Anstead N, O’Loughlin B. Social media analysis and public opinion: The 2010 UK general election. J Comput-Mediat Commun. 2015;20(2):204–220. doi: 10.1111/jcc4.12102. [DOI] [Google Scholar]
  2. Arafat SY, Kar SK, Marthoenis M, Sharma P, Apu EH, Kabir R. Psychological underpinning of panic buying during pandemic (covid-19) Psychiatry Res. 2020;289:113061. doi: 10.1016/j.psychres.2020.113061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Armani AM, Hurt DE, Hwang D, McCarthy MC, Scholtz A. Low-tech solutions for the covid-19 supply chain crisis. Nat Rev Mater. 2020;5:1–4. doi: 10.1038/s41578-020-0205-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Arumita A (2020) Changes in the structure and system of the shopping center area due to covid-19. Available at SSRN 3590973.
  5. Bakshy E, Messing S, Adamic LA. Exposure to ideologically diverse news and opinion on facebook. Science. 2015;348(6239):1130–1132. doi: 10.1126/science.aaa1160. [DOI] [PubMed] [Google Scholar]
  6. Birim S, Kazancoglu I, Mangla SK, Kahraman A, Kazancoglu Y (2022) The derived demand for advertising expenses and implications on sustainability: a comparative study using deep learning and traditional machine learning methods, Annals of Operations Research (Forthcoming). [DOI] [PMC free article] [PubMed]
  7. Chakraborti R, Roberts G (2020) Learning to hoard: the effects of preexisting and surprise price-gouging regulation during the covid-19 pandemic. Available at SSRN: https://ssrn.com/abstract=3672300. [DOI] [PMC free article] [PubMed]
  8. Chalapathy R, Chawla S (2019) Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407.
  9. Chen H, De P, Hu YJ, Hwang BH. Wisdom of crowds: the value of stock opinions transmitted through social media. The Review of Financial Studies. 2014;27(5):1367–1403. doi: 10.1093/rfs/hhu001. [DOI] [Google Scholar]
  10. Choi TM, Chan HK, Yue X. Recent development in big data analytics for business operations and risk management. IEEE Transactions on Cybernetics. 2016;47(1):81–92. doi: 10.1109/TCYB.2015.2507599. [DOI] [PubMed] [Google Scholar]
  11. Choi TM, Wallace SW, Wang Y. Big data analytics in operations management. Prod Oper Manag. 2018;27(10):1868–1883. doi: 10.1111/poms.12838. [DOI] [Google Scholar]
  12. Cohen MC, Leung NHZ, Panchamgam K, Perakis G, Smith A. The impact of linear optimization on promotion planning. Oper Res. 2017;65(2):446–468. doi: 10.1287/opre.2016.1573. [DOI] [Google Scholar]
  13. Cohen MC, Dahan S, Rule C. Conflict analytics: when data science meets dispute resolution. Manag Business Rev. 2022;2(2):86–93. [Google Scholar]
  14. Cohen MC, Perakis G, Thraves C. Consumer surplus under demand uncertainty. Prod Oper Manag. 2022;31(2):478–494. doi: 10.1111/poms.13554. [DOI] [Google Scholar]
  15. Cohen, M. C., Dahan, S., Khern-am-nuai, W., Shimao, H., and Touboul, J. (2023) The Use of AI in Legal Systems: Determining Independent Contractor vs. Employee Status. Artificial Intelligence and Law (Forthcoming). [DOI] [PMC free article] [PubMed]
  16. Croson R, Donohue K, Katok E, Sterman J. Order stability in supply chains: Coordination risk and the role of coordination stock. Prod Oper Manag. 2014;23(2):176–196. doi: 10.1111/j.1937-5956.2012.01422.x. [DOI] [Google Scholar]
  17. Cui R, Li M, Zhang S. Ai and procurement. Manuf Serv Oper Manag. 2022;24(2):691–706. doi: 10.1287/msom.2021.0989. [DOI] [Google Scholar]
  18. Edmiston J (2020) ‘it’s madness’: Panic buying leaves long lines and empty shelves at grocers across country. [url: https://financialpost.com/news/retail-marketing/its-madness-panic-buying-leaves-long-lines-and-empty-shelves-at-grocers-across-country; last accessed 29-August-2020].
  19. Fisher M, Raman A. Using data and big data in retailing. Prod Oper Manag. 2018;27(9):1665–1669. doi: 10.1111/poms.12846. [DOI] [Google Scholar]
  20. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–1232. doi: 10.1214/aos/1013203451. [DOI] [Google Scholar]
  21. Furutani K (2020) People in japan are panic-buying toilet paper due to covid-19 coronavirus. [url: https://www.timeout.com/tokyo/news/people-in-japan-are-panic-buying-toilet-paper-due-to-covid-19-coronavirus-030220; last accessed 29-August-2020].
  22. Gaikar D, Marakarkandy B. Product sales prediction based on sentiment analysis using twitter data. Int J Comput Sci Inf Technol (IJCSIT) 2015;6(3):2303–2313. [Google Scholar]
  23. Gilbert C, Hutto E (2014) Vader: A parsimonious rule-based model for sentiment analysis of social media text. Eighth International Conference on Weblogs and Social Media (ICWSM-14)., volume 81, 82.
  24. Gopal VG (2021) How changes in consumer preferences and buying behaviour have caused more stock outs in 2021. [url: https://startupsmagazine.co.uk/article-how-changes-consumer-preferences-and-buying-behaviour-have-caused-more-stock-outs-2021; last accessed 21-March-2021].
  25. Hamister JW, Magazine MJ, Polak GG. Integrating analytics through the big data information chain: A case from supply chain management. J Bus Logist. 2018;39(3):220–230. doi: 10.1111/jbl.12192. [DOI] [Google Scholar]
  26. Han BR, Sun T, Chu LY, Wu L (2020) Covid-19 and e-commerce operations: Evidence from alibaba. Available at SSRN: https://ssrn.com/abstract=3654859.
  27. Hancock J, Khoshgoftaar TM (2021) Leveraging lightgbm for categorical big data. 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService), 149–154 (IEEE).
  28. Hariri S, Kind MC, Brunner RJ. Extended isolation forest. IEEE Trans Knowl Data Eng. 2021;33(4):1479–1489. doi: 10.1109/TKDE.2019.2947676. [DOI] [Google Scholar]
  29. Hong J, Liu CC, Govindarasu M. Integrated anomaly detection for cyber security of the substations. IEEE Transactions on Smart Grid. 2014;5(4):1643–1653. doi: 10.1109/TSG.2013.2294473. [DOI] [Google Scholar]
  30. Husain W, Xin LK, Jothi N, et al. (2016) Predicting generalized anxiety disorder among women using random forest approach. In: 2016 3rd international conference on computer and information sciences (ICCOINS), 37–42.
  31. Ilk N, Shang G, Goes P. Improving customer routing in contact centers: an automated triage design based on text analytics. J Oper Manag. 2020;66(5):553–577. doi: 10.1002/joom.1084. [DOI] [Google Scholar]
  32. Ivanov D, Dolgui A (2020) Viability of intertwined supply networks: extending the supply chain resilience angles towards survivability. a position paper motivated by covid-19 outbreak. Int J Prod Res 58(10):2904–2915.
  33. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 3146–3154.
  34. Khern-am-nuai, Warut and So, Hyunji and Cohen, Maxime C. and Adulyasak, Yossiri (2022) Selecting Cover Images for Restaurant Reviews: AI vs. Wisdom of the Crowd. Available at SSRN: https://ssrn.com/abstract=3808667.
  35. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  36. Lamsal R (2020) Coronavirus (covid-19) tweets dataset. [url: 10.21227/781w-ef42].
  37. Larose DT (2015) Data mining and predictive analytics (John Wiley & Sons).
  38. Leswing K (2021) Why there’s a chip shortage that’s hurting everything from the playstation 5 to the chevy malibu. https://www.cnbc.com/2021/02/10/whats-causing-the-chip-shortage-affecting-ps5-cars-and-more.html; Last accessed 21-March-2021.
  39. Li S, Zhang Z, Liu Y, Ng S. The closer I am, the safer I feel: The “distance proximity effect” of covid-19 pandemic on individuals’ risk assessment and irrational consumption. Psychol Mark. 2021;38(11):2006–2018. doi: 10.1002/mar.21552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lins S, Koch R, Aquino S, de Freitas MC, Costa IM. Anxiety, depression, and stress: Can mental health variables predict panic buying? J Psychiatr Res. 2021;144:434–440. doi: 10.1016/j.jpsychires.2021.11.008. [DOI] [PubMed] [Google Scholar]
  41. Liu FT, Ting KM, Zhou Z. Isolation forest. Eighth IEEE International Conference on Data Mining. 2008;2008:413–422. [Google Scholar]
  42. Liu FT, Ting KM, Zhou ZH. Isolation-based anomaly detection. ACM Trans Knowl Discovery from Data (TKDD) 2012;6(1):1–39. doi: 10.1145/2133360.2133363. [DOI] [Google Scholar]
  43. Lufkin B (2020) Coronavirus: The psychology of panic buying. https://www.bbc.com/worklife/a rticle/20200304-coronavirus-covid-19-update-why-people-are-stockpiling; Last accessed 29-August-2020.
  44. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inform Process Syst (NeurIPS) 2017;30:4765–4774. [Google Scholar]
  45. Ma S, Tourani R (2020) Predictive and causal implications of using shapley value for model interpretation. Proceedings of the 2020 KDD Workshop on Causal Discovery, 23–38.
  46. Makridakis S, Spiliotis E, Assimakopoulos V, Chen Z, Gaba A, Tsetlin I, Winkler RL (2021) The m5 uncertainty competition: Results, findings and conclusions, Int J Forecasting.
  47. Mehrotra KG, Mohan CK, Huang H (2017) Anomaly detection principles and algorithms (Springer).
  48. Mitchell TW. Competitive illusion as a cause of business cycles. Q J Econ. 1924;38(4):631–652. doi: 10.2307/1884594. [DOI] [Google Scholar]
  49. Naeem M, Ozuem W. Customers’ social interactions and panic buying behavior: Insights from social media practices. J Consum Behav. 2021;20:1191–1203. doi: 10.1002/cb.1925. [DOI] [Google Scholar]
  50. Pamuru V, Kar W, Khern-am nuai W (2022) Status downgrade: The impact of losing status on a user generated content platform. Available at SSRN: https://ssrn.com/abstract=3963415.
  51. Paula EL, Ladeira M, Carvalho RN, Marzagao T (2016) Deep learning anomaly detection as support fraud investigation in brazilian exports and anti-money laundering. 2016 In: 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 954–960.
  52. Perera HN, Fahimnia B, Tokar T. Inventory and ordering decisions: a systematic review on research driven through behavioral experiments. Int J Oper Prod Manag. 2020;40(7/8):997–1039. doi: 10.1108/IJOPM-05-2019-0339. [DOI] [Google Scholar]
  53. Prentice C, Chen J, Stantic B. Timed intervention in covid-19 and panic buying. J Retail Consum Serv. 2020;57:102203. doi: 10.1016/j.jretconser.2020.102203. [DOI] [Google Scholar]
  54. Qi M, Shi Y, Qi Y, Ma C, Yuan R, Wu D, Shen ZJM (2020) A practical end-to-end inventory management model with deep learning. Available at SSRN: https://ssrn.com/abstract=3737780.
  55. Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, Wu SY. Prediction of number of cases of 2019 novel coronavirus (covid-19) using social media search index. Int J Environ Res Public Health. 2020;17(7):2365. doi: 10.3390/ijerph17072365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Sabic E, Keeley D, Henderson B, Nannemann S. Healthcare and anomaly detection: using machine learning to predict anomalies in heart rate data. AI & Soc. 2021;36(1):149–158. doi: 10.1007/s00146-020-00985-1. [DOI] [Google Scholar]
  57. Salem O, Guerassimov A, Mehaoua A, Marcus A, Furht B. Sensor fault and patient anomaly detection and classification in medical wireless sensor networks. IEEE Int Conf Commun (ICC) 2013;2013:4373–4378. [Google Scholar]
  58. Samaras L, Garcia-Barriocanal E, Sicilia MA. comparing social media and google to detect and predict severe epidemics. Sci Rep. 2020;10(1):1–11. doi: 10.1038/s41598-020-61686-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Sanders NR, Ganeshan R. Big data in supply chain management. Prod Oper Manag. 2018;27(10):1745–1748. doi: 10.1111/poms.12892. [DOI] [Google Scholar]
  60. Sejnowski TJ, Rosenberg CR. Parallel networks that learn to pronounce english text. Complex Systems. 1987;1(1):145–168. [Google Scholar]
  61. Settanni E. Those who do not move, do not notice their (supply) chains—inconvenient lessons from disruptions related to COVID-19. AI & Soc. 2020;35(4):1065–1071. doi: 10.1007/s00146-020-00988-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Shapley LS. A value for n-person games. Contributions Theory Games. 1953;2(28):307–317. [Google Scholar]
  63. Shimao, H., Khern-am-nuai, W., Kannan, K., and Cohen, M. C. (2022, July). Strategic Best Response Fairness in Fair Machine Learning. In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (pp. 664–664).
  64. Shin D. How do people judge the credibility of algorithmic sources? AI & Soc. 2022;37:81–96. doi: 10.1007/s00146-021-01158-4. [DOI] [Google Scholar]
  65. Shin D. Algorithms, humans, and interactions: How do algorithms interact with people? Designing Meaningful AI Experiences: Taylor & Francis; 2023. [Google Scholar]
  66. Shin D, Kee KF, Shin EY. Algorithm awareness: Why user awareness is critical for personal privacy in the adoption of algorithmic platforms? Int J Inf Manage. 2022;65:102494. doi: 10.1016/j.ijinfomgt.2022.102494. [DOI] [Google Scholar]
  67. Shin D, Lim JS, Ahmad N, Ibahrine M (2022b) Understanding user sensemaking in fairness and transparency in algorithms: algorithmic sensemaking in over-the-top platform. AI & Society, 1–14.
  68. Sodhi M, Tang C (2020) Supply chain management for extreme conditions: Research opportunities, J Supply Chain Manag.
  69. Sterman JD, Dogan G. “I’m not hoarding, I’m just stocking up before the hoarders get here”.: Behavioral causes of phantom ordering in supply chains. J Oper Manag. 2015;39:6–22. doi: 10.1016/j.jom.2015.07.002. [DOI] [Google Scholar]
  70. Tanlamai J, Khern-am nuai W, Adulyasak Y (2022) Arbitrage opportunities predictions in retail markets and the role of user-generated content. Available at SSRN: https://ssrn.com/abstract=3764048.
  71. Taylor SJ, Letham B. Forecasting at scale. Am Stat. 2018;72(1):37–45. doi: 10.1080/00031305.2017.1380080. [DOI] [Google Scholar]
  72. Tillett A (2020) Medicines rationed to stop panic buying. https://www.afr.com/politics/federal/medicines-rationed-to-stop-panic-buying-20200319-p54bsl; Last accessed 21-March-2021.
  73. Tsyganov, V. (2021). Artificial intelligence, public control, and supply of a vital commodity like COVID-19 vaccine. AI & Society, 1–10. [DOI] [PMC free article] [PubMed]
  74. van Noordt C, Misuraca G. Artificial intelligence for the public sector: results of landscaping the use of AI in government across the European Union. Gov Inf Q. 2022;39(3):101714. doi: 10.1016/j.giq.2022.101714. [DOI] [Google Scholar]
  75. Wang G, Gunasekaran A, Ngai EW, Papadopoulos T. Big data analytics in logistics and supply chain management: certain investigations for research and applications. Int J Prod Econ. 2016;176:98–110. doi: 10.1016/j.ijpe.2016.03.014. [DOI] [Google Scholar]
  76. Wu PJ, Chien CL. Ai-based quality risk management in omnichannel operations: O2o food dissimilarity. Comput Ind Eng. 2021;160:107556. doi: 10.1016/j.cie.2021.107556. [DOI] [Google Scholar]
  77. Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng Y, Chen J, Wang Z, Qiao H (2018a) Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. arXiv preprint arXiv: 1802.03903.
  78. Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng Y, et al. (2018b) Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. Proceedings of the 2018b World Wide Web Conference, 187–196.
  79. Yeoh W, Koronios A. Critical success factors for business intelligence systems. J Comput Inform Syst. 2010;50(3):23–32. [Google Scholar]
  80. Zheng R, Shou B, Yang J. Supply disruption management under consumer panic buying and social learning effects. Omega. 2021;101:102238. doi: 10.1016/j.omega.2020.102238. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are supplied by our retail partner, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the retail partner.


Articles from Ai & Society are provided here courtesy of Nature Publishing Group

RESOURCES