Abstract
To design preventive policy measures for email phishing, it is helpful to be aware of the phishing schemes and trends that are currently applied. How phishing schemes and patterns emerge and adapt is an ongoing field of study. Existing phishing works already reveal a rich set of phishing schemes, patterns, and trends that provide insight into the mechanisms used. However, there seems to be limited knowledge about how email phishing is affected in periods of social disturbance, such as COVID-19 in which phishing numbers have quadrupled. Therefore, we investigate how the COVID-19 pandemic influences the phishing emails sent during the first year of the pandemic. The email content (header data and html body, excl. attachments) is evaluated to assess how the pandemic influences the topics of phishing emails over time (peaks and trends), whether email campaigns correlate with momentous events and trends of the COVID-19 pandemic, and what hidden content revealed. This is studied through an in-depth analysis of the body of 500.000 phishing emails addressed to Dutch registered top-level domains collected during the start of the pandemic. The study reveals that most COVID-19 related phishing emails follow known patterns indicating that perpetrators are more likely to adapt than to reinvent their schemes.
Keywords: COVID-19, Phishing, Cybercrime, Pandemic, Pattern shifts, Dutch firms
1. Introduction
The crisis resulting from the COVID-19 pandemic has had profound implications worldwide, on, among others, global health and health systems (Walker et al., 2020), the global social and economic situation, and almost every other aspect of daily life (Atkeson, 2020, Nicola, Alsafi, Sohrabi, Kerwan, Al-Jabir, Iosifidis, Agha, Agha, 2020). In particular, lockdown measures and social distancing have caused a great change in the routine activities of many people. For instance, in countries around the world, the pandemic had a dramatic impact on travel patterns, such as the number of trips, distances travelled, purpose of travel, and choice of travel mode (Cats and Hoogendoorn, 2020). There was a decrease in the use of cars and public transport, as well as an increase in walking and cycling, which involved more recreational trips. Other changes in activity patterns occur more in online shopping. Dutch data showed a shift in movements in time and space, but not necessarily in the number of trips that people have been making. For example, the pedestrian data show more walks in parks on the weekends while far fewer people walk on the streets (Cats and Hoogendoorn, 2020). Overall, the Dutch went out less often to buy groceries, shop, exercise, and visit people (de Haas et al., 2020). A lot of research shows that opportunities for crime and people’s routine activities are relatively strongly related to crime. The amount of time individuals spent outdoors and the activities they are involved in are in strong correlation with their likelihood of becoming a victim of a broad variety of crime types, including property crime (Kennedy, Forde, 1990, van Kesteren et al., 2013), violence (Sherman, Gartin, Buerger, 1989, Tilley, Sidebottom, 2015), and fraud (Holtfreter et al., 2008). As COVID-19 changed opportunities for crime, it is plausible that the lockdown affected crime rates. This also suggests that the societal changes because of COVID-19 would also impact trends in crime-related activities.
In the US (Ashby, 2020, Boman, Gallupe, 2020, Bullinger, Carr, Packham, 2020, Felson, Jiang, Xu, 2020, Mohler et al., 2020) and in Canada (Hodgkinson and Andresen, 2020), countries with a more non-committal approach to covid restrictions (e.g., lockdowns), declines in physical crime were indeed found during the pandemic, but overall results seemed to be relatively inconsistent. The studies show that there were usually no significant changes in the frequency of serious assaults in public or in the frequency of serious assaults in residences. In some US cities, there were reductions in residential burglary but little change in non-residential burglary (Ashby, 2020, Boman, Gallupe, 2020, Bullinger, Carr, Packham, 2020, Felson, Jiang, Xu, 2020, Mohler et al., 2020). European studies seemed to find a stronger impact of the measures taken to fight the virus. In France, almost all crimes and the associated measures for about every type of crime showed a very strong decline during the lockdown. More specifically, fraud overall declined as well (InterStats, 2020). Similarly, in the UK victim surveys found a decline in crime of 32% (excluding fraud and cybercrime) and a similar decline of 31% in police-recorded crime. Fraud and computer misuse also fell by 16% (Office for National Statistics UK, August, 2020). These findings illustrate the extent to which offenders are responsive to the context and respond quickly and flexibly to changed circumstances. It seems plausible that the impact of crime is proportional to the extent to which stringent measures were taken by governments and followed up by citizens in each country. Stringency index numbers during the first lockdown (March 2020) provide an indication of this difference (Mathieu et al., 2020). This might explain the difference between the USA and Canada and Europe and suggest the impact of the lockdown on routine activities and opportunities in Europe and in North America.
Fewer studies investigated the impact of the COVID-19 crisis on cybercrime. While there was a decrease in physical-related crime activities (Europol, 2020), e.g., property crime, during the first COVID-19 outbreak in Europe, a noticeable shift and surge took place towards online fraudulent activities (Buil-Gil et al., 2020). A significant increase in particular was observed in phishing, which has quadrupled during the outbreak (APWG, 2020a) and has increased eightfold since then (APWG, 2022). In the Netherlands and other countries, phishing is considered a criminal activity and is actively prosecuted (Rechtsraak, 2022). Fraudsters have often “benefited” from disasters (Aguirre and Lane, 2019). To illustrate, attackers have made extensive use of the COVID-19 crisis to design phishing emails. Typical examples reported in the media are zoom phishing emails, fraudulent CEO emails, and phishing emails aimed at healthcare institutions (APWG, 2020b). This sudden rise of COVID-19 phishing fraud as a global problem may be explained by the COVID-19 outbreak. That is, because the social disturbance resulting from a disaster makes society typically more vulnerable to fraudulent activities, hence, more susceptible to phishing attacks (Aguirre and Lane, 2019). We should be aware of the magnitude of impact these COVID-19 related fraudulent activities may cause. In particular, because this impact is often underestimated (Lastdrager, 2018). Phishing, apart from its effectivity to gain direct financial gain (Laan, 2021), is also the typical starting point for leads to successful cyber attacks and resulting data breaches, of course, associated with all sorts of financial losses (CNBC, Lastdrager, 2018). All such organizational as well as societal costs ask for preventive measures to increase resilience against cyber-attacks, such as awareness campaigns, and the ability to timely scale customer support when novel phishing schemes or adaptions are noticed or expected.
Based on these observations, we must recognize the importance of analyzing new phishing behavior that appeared during the pandemic. Therefore, this study focuses on COVID-19 related phishing emails to understand better how attackers adapt to new societal conditions.
This analysis produced two types of contributions. On the one hand, this paper provides insights that lead to revealing patterns of fraudulent characteristics that were applied in the phishing schemes, which may be used in support of societal awareness campaigns of phishing that reduce the societal costs of cybercrime. On the other hand, this paper shows that phishing scheme adoption is commonly observed, and seems preferable to novel scheme development. This paper further reflects on this adaption choice by attackers from multiple theory perspectives to explain this behavior.
The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 explains the research method. Section 4 presents the results of the COVID-19 phishing email analysis. Finally, Section 5 discusses all results followed by conclusions in Section 6.
2. Related work
The increase in cyber attacks during the pandemic has triggered several studies researching how cybercriminals develop new attacking strategies and fraudulent operations. These new studies roughly focused on two types of cybercrime, i.e., cyber-dependent crime (Furnell et al., 2015) and cyber-enabled crime (Lallie et al., 2021); where cyber-dependent crime covers hacking, malware, and denial of service, cyber-enabled crimes covers financial fraud, phishing, pharming, and extortion. This related work section provides a review of the literature related to cybercrime during the COVID-19 pandemic, with a focus on phishing.
A number of studies are conducted that aim to prevent cybercrime during the COVID-19 pandemic. In one case study by Groenendaal and Helsloot (2021), cyber strength during COVID-19 is analyzed and possible approaches for improvement are discussed. The authors followed a well-accepted theory in cyber resilience, i.e., the resilience analysis grid proposed by Hollnagel (2017). The grid allows organisations to measure the performance on four potentials: (i) anticipate, (ii) monitor, (iii) respond, and (iv) learn, as is suggested that the potentials are dependent and alignment would create better cyber resilience. In addition, several studies address the need for more awareness about cyber attacks given the increase in cyber attacks during the crisis. Alzubaidi (2021) contributed to this direction by surveying the level of cyber awareness in Saudi Arabia. The study discusses the most common security tools used by internet users and their cyber security habits. The authors recognized security awareness training is a must for different organizations, especially in the field of phishing due to the increase in cyber-attacks. Since cyber-criminals frequently use different mechanisms for online scamming, awareness about possible strategies contributes to protecting them from such crimes. This pandemic offered them a chance to exploit these attacking mechanisms and apply them to those already worried. Chawki (2021) performed case studies in the USA and the European Union and proposes plausible ways to safeguard online users from such attacks. The results from the cyber-criminal forums indicate that healthcare agencies were prime targets for such fraudulent activities where attackers gain visibility on confidential documents about patients (Alghamdi, 2022, Chawki, 2021, Gafni, Pavel, 2021).
Other researchers compared the criminal patterns during the COVID-19 outbreak to other pandemic outbreaks. Levi and Smith (2021) compared COVID-19 with the Spanish flu pandemic of 2018 by analyzing the common features that lead to different crimes impacting society. Along with the Spanish flu, they focused their interest on other different flues that occurred in the world, such as the Asian flu (1957–58), the Hong Kong flu (1968), and the Swine flu (2009–10). The comparative approach resulted also in the identification of new types of attacks during this pandemic and a proposal of best practices to avoid such attacks.
Furthermore, phishing emails have been analysed in several studies with the aim to detect time patterns during the COVID-19 pandemic. To understand diverse attacks during the pandemic, Lallie et al. (2021) proposed a world timeline analysis of COVID-19 (from 2019 to 2020). They searched for patterns occurring along with COVID-19 related events in different countries (e.g., China, the UK, Spain, the USA, Italy, and the Philippines). Events included, among others, government announcements and articles and reports published by the media. The study concluded that 86% of cyber-attacks out of 43 involved phishing and/or smishing. The researchers further identified new malicious website domain registrations with Corona-related keywords and proposed a few solutions to diminish the cyber-attack rate. The study on cybercrime and its trends are further analyzed by Kemp et al. (2021) based on the reported crime in the UK. They considered the timeline analysis which further can be deviated based on the number of crimes reported during the moment. Also, Venkatesha et al. (2021) performed a similar study by identifying the cause of social engineering attacks during the COVID-19 pandemic and proposed a few techniques to avoid such attacks. The work of Sood et al. (2021) detect trends in the total number of malware and phishing-related messages blocked by Google during April 2020 in both emails and communication tools such as Google Meet (Kumaran and Lugani, 2020).
Another group of studies focuses on identifying and aggregating the modus operandi used in phishing emails during the COVID-19 pandemic. A survey of Al-Qahtani and Cresci, 2022 reviewed 54 studies about phishing attacks and analysed the modus operandi and the proposed techniques for detecting COVID-19 phishing, smishing, and vishing attacks. As indicated in the Microsoft Digital Defense Report, phishing attacks consist of almost 70% of all cyber attacks (Kaliňák, 2021). The work of Akdemir and Yenal (2021) analysed 208 COVID-19 phishing emails in April 2020 and identified 9 subjects that were used to target organizations and individual users, being fear appeals, urgency cues, source credibility, authority, liking, social proof, consistency commitment, scarcity, reciprocity (many of Cialdini’s principles Cialdini and Sagarin, 2005). The study by Sharevski et al. (2022) explored, in a laboratory setting, the susceptibility of people to phishing via QR codes, a technology often used during the COVID-19 pandemic.
There is also attention to the automated classification of COVID-19 related phishing emails. Since phishing has grown to substantial sizes, multiple researchers (Alsmadi, Alhami, 2015, Hamid, Abawajy, 2013, Karim, Azam, Shanmugam, Kannoorpatti, 2020) considered machine-learning algorithms such as -means, OPTICS, -modes, etc. for email clustering (Zubair et al., 2021) and classifying the email contents according to similarity of features to quickly gain insights into the malicious activities performed by the attackers, i.e., the modus operandi. There are behaviour-based classification methods (Hamid, Abawajy, 2011, Toolan, Carthy, 2010) investigated along with the content analysis of the emails (Basnet, Sung, 2010, Fette, Sadeh, Tomasic, 2007), and email profiling methods studied to detect patterns in, for example, important email features such as hyperlinks, email subject, (Gansterer, Pölz, 2009, Hamid, Abawajy, 2013, Yearwood, Mammadov, Webb, 2012) header and domain features (Karim et al., 2020), and URLs in the message content (Afandi, Hamid, 2021, Ispahany, Islam, 2021). The literature further reports on systems and case studies of automatic phishing classification (i.e., to determine if emails are phishing or not). In that regard, Karim et al. (2020) proposed an automated framework for anti-spam detection that exploited unsupervised methodologies. Ispahany and Islam (2021) proposed a machine-learning classification technique for detecting malicious URLs. In the proposed framework of Xia et al. (2021), COVID-19 related keywords were identified to detect malicious domains. In addition, Afandi and Hamid (2021) exploited the KNN algorithm to detect phishing hyperlinks by considering the four datasets PhishTank, Kaggle, SpyCloud, and DomainTool. However, their study was limited to five features of the hyperlinks which have more room to analyze the fact in detail. Further, Kawaoka et al. (2021); Pletinckx et al. (2021) worked in a related path to analyze early COVID-19 related domain name registrations. Patgiri et al. (2019); Patil and Patil (2018); Rameem Zahra et al. (2021) use machine learning methods, such as decision trees and fuzzy logic, to learn the malicious URLs. Also, natural language processing techniques (Sahingoz et al., 2019) and Shannon’s entropy (Verma and Das, 2017) are used to determine the maliciousness of a URL. At last, a case study on Twitter data explores the malicious and inconsistent URLs during COVID-19 to identify link-sharing patterns (Horawalavithana et al., 2021). The authors suggest improving topic moderation techniques on Twitter data that mitigate the intent of poor players in promoting malicious activities. Besides, one can further investigate the quality of these poor players and how they can effectively plan their road map during the crisis.
Finally, studies have been conducted that concentrate on the crime patterns and the shifts to online crime in general (Hardyns et al., 2021), and victimization during COVID-19. Hardyns et al. (2021), for example, studied common crime patterns such as burglary, violence, vehicle theft during the pandemic in Belgium. They found that, for example, cases of domestic violence and the general crime rate reported during the Corona period were similar from 2015 to 2019 but growth was observed for cybercrimes, particularly phishing and online scams. Therefore, victimization should also be taken into account to understand and analyze the activities performed by attackers, as the fraud committed during COVID-19 affects the victims socially and mentally. Such a study was conducted by Kennedy et al. (2021) by surveying 2200 Americans during COVID-19. Although the paper discusses the facts of the victimization and proposes solutions to mitigate cybercrimes at a particular time of COVID-19, the authors also point out that proposed solutions are consistent with studies conducted in other periods before the pandemic.
As discussed in the literature (Aleroud and Zhou, 2017), many studies have been conducted to understand and analyze the behavior of cybercriminals. Furthermore, various methods have been proposed to mitigate malicious activities. However, a thorough analysis on understanding the characteristics of phishing emails during COVID-19 is lacking. In the present study, we analyze the behaviour of cyber criminals concerning phishing emails received at firm domains in the Netherlands. In addition, the research examines the impact of COVID-19 on phishing emails by considering various trends and events announced by the government.
3. Methodology
With the importance stressed for analyzing new phishing behavior that appeared during the pandemic, the present study focuses on COVID-19 related phishing emails including an analysis of the contents and a trend analysis, to understand better how attackers adapted to new societal conditions. The goal of the analysis is to create insights into applied patterns abusing the COVID-19 pandemic to deceive people with their phishing schemes. This leads to the following key research question: which effects did COVID-19 have on patterns in phishing emails?
The key research question is concerned with creating explanations for applied practices (behavior) of cyber criminals in a time of crisis. This is a typical interpretive question since it aims to gain in-depth knowledge of actor behavior in their natural context while developing an empathetic understanding of their actions (Goldkuhl, 2012). As phishing is an illegal activity in most countries it is difficult to directly interact with actors on a large scale to study the patterns of behavior. Also, in the Netherlands, email phishing is a criminal activity that is actively monitored and prosecuted (Fraudehelpdesk, Rechtsraak). The data trail phishers create, however, prevails as a rich source to gain a large-scale overview and create insights into COVID-19 related phishing. Therefore, we adopt a quantitative research approach. The data collection is based on a document (email) analysis and the empirical method selected is content analysis. We intentionally do not differentiate between perpetrators’ motivations and specific types of criminals such as nation-state actors, hacktivists, or people motivated by the thrill of criminal activities. Although it would be interesting to determine the motivation of the sender of the phishing emails studied, our data does not allow us to identify the perpetrator, nor do we consider it to be the scope of this study. The study aims to understand the emergence and adaption of content related to COVID-19.
3.1. Dataset description
The dataset used in this research contains COVID-19 related phishing emails. This data was collected by Tesorion.1 The emails are collected via 1105 top-level domains2 that were previously managed by Tesorion, but are taken out of use. The data was collected between Jan 17th 2020 and 8th of March 2021. The selection of this data is based on the initial start of collection by Tesorion just before the European pandemic outbreak until about one year after the first COVID-19 restrictions were announced in Europe. The inclusion criteria for emails to be classified as COVID-19 related emails was based on a list of COVID-19 related keywords such as Covid-19, corona, or Pandemic (for the full list see Appendix A). The list of keywords has been derived from several other papers and online sources discussing corona-related phishing and corona-related spam (Chen, Lerman, Ferrara, 2020, Cinelli, Quattrociocchi, Galeazzi, Valensise, Brugnoli, Schmidt, Zola, Zollo, Scala, 2020, Kouzy, Abi Jaoude, Kraitem, El Alam, Karam, Adib, Zarka, Traboulsi, Akl, Baddour, 2020, Kousha, Thelwall, Mimecast, 2020). The total number of corona related emails received to these domains is 1.076.541. The emails contain the following key features that are used for the analysis, mail_id, received_date, from_address, subject, filename, hash, plain_body, html_body to_domain_id, and attachment
3.2. Pre-processing data
To prepare the data, we follow the guideline for pre-processing as described in Gibert et al. (2016). This resulted in the following filtering and pre-processing steps for this study (see Fig. 1 for process flow):
-
1.
First, we divide the initial data set (1.076.541 emails) into emails having attachments (148.295) and those without (928.246). This study focuses solely on the analysis of emails without attachments to gain insights as we are highly interested in the body content of emails that can be further analyzed by NLP techniques. Attachment analysis is much more software injection oriented, and falls beyond the scoping of this paper, but is addressed later as future work.
-
2.
On the COVID-19 related email data (without attachments) we apply a number of filters to the html body content with the aim to retrieve the textual content only so that it can be used for topic modelling. The following functions are applied in order: (i) use of the beautifulsoup python package (Crummy, 2021) to get textual contents of emails, (ii) remove email addresses, (iii) remove all non- ascii characters, (iv) lowercase all words, (v) remove urls, (vi) remove html special characters, (vii) remove all types of brackets, (vii) remove unnecessary white spaces, tabs and newlines, (viii) remove (e) numerations, (iX) remove punctuation.
-
3.
The initial dataset contains emails in different languages such as English, Dutch, French, and German. Therefore, we determined the language of each email using the Python package CLD3 (Google, 2020) and only applied further pre-processing on English emails. By targeting only English language-based emails we reduce the challenges of analyzing emails in other languages while doing topic clustering. That further helped to achieve the research goals effectively. This step reduces the dataset to 594.895 emails.
-
4.
In addition, we removed emails with duplicate email body which further reduced the data set to 104.228 unique emails.
-
5.
In the next step, we determined for the emails which of them are phishing emails. This was achieved with the help of the VirusTotal (VirusTotal, 2020) API, resulting in the identification of 29.171 phishing emails. VirusTotal is to date considered one the top performing tool for classifying phishing emails (Choo et al., 2022a). Current studies evaluate the accuracy of the VirusTotal phishing classification to be at 81.72% (Choo et al., 2022a). To determine whether a URL is regarded as phishing, VirusTotal queries over 70 antivirus scanners and services to return whether and how many services flagged a submitted URL as malicious (VirusTotal, 2022). The topic model analysis, further described in Section 3.3, is based on those phishing emails. The motivation for taking all phishing emails (including similar emails) as the basis for the topic model analysis is to consider all possibly relevant topics.
-
6.
To prepare for topic analysis, we removed common words from the email body to derive a more refined set of words determining topic clusters. Firstly, we remove all common words and stop words based in the NLTK corpus (NLTK, 2021). Secondly, we filter the 35.000 least significant words according to Term Frequency Inverse Document Frequency (TFIDF) (Luhn, 1957, Spark Jones, 1972). This number was determined through human experimentation. Finally, we remove keywords that had too much overlap with other clusters when indicated by at least 2 authors of this paper, with the goal to make topic clusters more distinct from each other. The keywords removed are: “view”, “email”, “click”, “offer”, “shop”, “free”, “com”, “open”, “sale”, “house”, “health”, “detail”, “unsubscribe”, “company”, “store”, “app”, “address”, “buy”, “receive”, “day”, “delay”, “business”, “south”, “said”, “product”, “delay”, “game”, “week”, “new”, “test”, “covid”, “coronavirus”, “trade”, “united”, “best”, “service”, “time”, “change”, “online”.
-
7.
In the next step, we removed phishing emails, which do not have an identical but very similar email body, using the discrete cosine similarity measure (Manning et al., 2008). The goal of this step is to reduce noise to better identify existing trends. We used a similarity value of 0.95. This number is determined through human inspection (Akhtar et al., 2017) about the effectiveness of duplicates removal (reviewing small samples of emails over the similarity value and whether these concern near duplicates or not). This refers to searching for an optimal True Positive/False Positive rate based on the parameter setting (here similarity value), but on a small sample rather than the full dataset since the data is not annotated for similar items and doing so would require severe efforts. The removal reduced the phishing emails to 11.765, and formed the basis for the trend and timeline analysis (see Sections 4.2 and 4.2.2).
-
8.
After having identified meaningful patterns, a set of emails remained in which potentially more patterns could be found. Therefore, we removed identified patterns as well as emails that can be grouped together but do not describe a technical or semantic pattern used by criminals (see Section 4.5). As an example, we identified around 100 Google Alert emails,3 which haven been possibly falsely classified as phishing. This step reduced the dataset size to 7.397 emails.
Fig. 1.
Overview of the methodology.
3.3. Analysis approach
In order to investigate how attackers use COVID-19 keywords in their phishing schemes, we searched for a model that can represent texts of different sizes in a feature space that clustering algorithms can work with (see Fig. 1, topic modelling). We decided on using a Doc2Vec (Le and Mikolov, 2014) method in combination with clustering algorithm -means (Lloyd, 1982), similar to the approaches by Budiarto et al. (2021) or Wang and Kwok (2021). The choice for selecting Doc2Vec over other methods, such as bag-of-words (Harris, 1954) to represent textual data, was its ability to incorporate the semantics of a text in its model (Le and Mikolov, 2014). In the course of the analysis, we realized that this method does not work well with our data. The used clustering method (-means) could not find meaningful clusters. We did not investigate in detail why Doc2Vec did not work on the data used in this study however, we suspect that the quality of the data in terms of large differences in lengths of emails, semantically incorrect emails (seemingly randomly combined text blocks) and emails having multiple topics, was not good enough to create satisfying results. As a result we tried a popular statistical model, Latent dirichlet allocation (LDA) (Blei et al., 2003), to find clusters (topics) in the dataset. The standard Gensim LDA model (Řehuřek, 2021) is used in combination with the pyLDAvis (Mabey, 2021) library to visualize the topic clusters. In order to get the ideal number of clusters, we tried several values to see with which number of clusters we get a reasonable outcome (see Section 4.1).
The second and third analysis concern trends and timelines. We tried to understand whether phishing emails follow any trends or relate to specific events. To get insights into the general timeline of phishing emails, we used standard python visualization libraries such as matplotlib (Matplotlib.org, 2022) to create time plots. When we observed spikes or other interesting points in the graphs, we investigated manually what types of emails are part of that spike. This research further investigates whether phishing campaigns made use of current events related to corona. As a reference, we used the timelines of COVID-19 measures and other related events of the Dutch government (Ministerie van Volksgezondheid, 2023) and the WHO (World Health Organization, 2022). For the verification, we inspected the days where high number of emails were received and manually checked emails whether they mention any events around that day that are listed in the timelines. The fourth analysis searches for date patterns. The fifth analysis is concerned with hidden content. During the topic model analysis, we observed that emails contained hidden text, e.g., white letters on white background. We then used regular expressions to find more of this type of emails to get a better insight into this pattern (see Section 4.3). Finally, we assessed if dominant patterns or trends would have distorted data that would impair our view on existing patterns. We subtracted the identified patterns to assess if the remainder contained interesting patterns. For the verification process, we formed two assumptions with which we could verify whether our findings are proven to be correct.
-
1.
If we remove dominant patterns and other frequently occurring types of emails (see Table 1 ), the general trend remains unchanged in the data. That means, the trend is not shaped by dominant data, but appears as a general trend (caused by a larger group of attackers/attacks).
-
2.
If we remove the dominant pattern, there are no other spikes appearing in the data. That means, it is likely that we have caught the largest campaigns that are event/date specific.
Table 1.
COVID-19 Phishing Patterns.
Pattern name | Pattern description | Pattern type | Relation to COVID-19 | Attacker motivation | Revealed in | Literature |
---|---|---|---|---|---|---|
Hidden pixel | A pattern frequently observed to add hidden textual content to an email in a small image usually with the intend to mislead spam filters to classify the email as genuine | Recurring | Predominant use of COVID-19 related news articles | To disguise | 4.2, 4.3 | Web bugs (Martin, Wu, Alsaid, 2003, McRae, Vaughn, 2007) tracking pixel (Hu et al., 2019) |
White color font | A pattern frequently observed to add hidden textual content to an email at the background usually with the intend to mislead spam filters to classify the email as genuine | Recurring | Predominant use of COVID-19 related news articles | To disguise | 4.2, 4.3 | Hidden salting (Bergholz, Paass, Reichartz, Strobel, Moens, Witten, 2008, Bergholz, De Beer, Glahn, Moens, Paaß, Strobel, 2010, Jáñez-Martino et al., 2022) |
HTML Email Preheader Text | A pattern frequently observed to add textual content to an email usually with the intend to mislead spam filters to classify the email as genuine | Recurring | Predominant use of news articles; both COVID-19 and non COVID-19 related | To disguise | Section 4.3 | HTML tag (Jáñez-Martino et al., 2022) |
Unsubscribe button | Fictive clickable link stating “unsubscribe”. Variants display “Manage subscriptions” | Recurring | No | To disguise | Section 4.3 | Unsubscribe” spam attack (Tsow and Jakobsson, 2007) |
Encoded HTML | Emails contain base64 encoded instructions (e.g., POST requests). Often the emails ask to enter credentials into a field in the email (e.g., fake login to see document) | Recurring | No | To disguise | Section 4.5 | Unicode (Liu and Stamm, 2007) encoding (Jáñez-Martino et al., 2022) |
Bit.ly links | Emails start with malicious bit.ly link, accompanied by news headlines. | Recurring | No | To disguise | Verification | URL features Adebowale et al. (2019) url shortener / tracker (Bhardwaj, Sapra, Kumar, Kumar, Arthi, 2020, Blancaflor, Alfonso, Banganay, et al., 2021, Niu, Zhang, Yang, Ma, Zhuo, 2017, Petelka, Zou, Schaub, 2019) |
E-moji in subject | An emoji is added to the subject line (various emojis observed) | Recurring | No | To disguise | Verification | Obfuscated words (Jáñez-Martino et al., 2022) |
Clickable image | Emails frequently include a clickable image (via i.imgur.com) that links towards a phishing website. | Recurring | No | To disguise | Section 4.3 | Image features (Adebowale et al., 2019) |
News headlines | The email subject header is disguised by a real news headline to gain your interest and forwarding you to a fake store to purchase and extort and swindle your data | Recurring | Predominant use of corona-related news articles | To gain interest | Section 4.2 | Obfuscated words (Jáñez-Martino et al., 2022) fake news headlines (Sarno et al., 2022) |
Face-mask | Emails are trying to scare you into buying face masks with the purpose to collect personal information | Recurring | Yes | To gain interest | 4.2, 4.2.1, 4.3 | Profiled purchasing (Hamid and Abawajy, 2013) compulsive buying (Halevi et al., 2015) deals too good to be true (Kirlappos and Sasse, 2011) |
Home warranty | Emails that are trying to scare you into take out an home insurance policy with the purpose to collect personal information | Recurring | No | To gain interest | Verification | Profiled purchasing (Hamid and Abawajy, 2013) compulsive buying (Halevi et al., 2015) deals too good to be true (Kirlappos and Sasse, 2011) |
Topic shift to Medical | Phishing emails concerned with the topic clusters “medical” & “information” increased substantially higher than other topics | Trend shift | Phishers targeting medical products and services | Likely higher victimization rate | Topic analysis 4.1 | |
Phishing campaign spikes | Many of the peaks in the analysis are caused by phishing campaigns. Phishing emails conform an email template (same email format, but slightly differentiated content, e.g., different company name), domain sender, etc., hence, are assumed to originate from the same attacker. Phishing campaigns, however, do not correlate to specific events. | Recurring | Likely higher victimization rate | Section 4.2 | Spikes e.g., (Legg, Blackman, 2019, Van Der Heijden, Allodi, 2019) | |
Dutch phishing rise march 2020 | The rise of phishing emails in the Netherlands corresponds in a broad sense to the announced measures taken by the Dutch government, and highly relates to the announcement classifying COVID-19 as a pandemic | Trend shift | Dutch COVID-19 restriction announcements | Likely higher victimization rate | Section 4.2.1 | |
Day and Time-depend susceptibility to phishing | It is apparent that the day of the week, and the time of the day play an influential role in the assumptions of attackers when people are most susceptible to phishing. Patterns follow work-week patterns of employees (mon-fri), with specific hours of breaktime, and email reading patterns | Recurring | Likely higher victimization rate | Section 4.2.2 | Weekend pattern (Lastdrager, 2018, Ramzan, Wüest, 2007) peak pattern (Drury et al., 2022) |
4. Analysis: patterns and potential explanations
This section presents and interprets the results of this study. First, the results of the topic model are presented in Section 4.1, followed by an overview of the number of received phishing emails during the time frame of the dataset (Section 4.2). Subsequent sections discuss correlations between phishing emails and COVID-19 related events, as well as findings in domains, time and date of received emails (Sections 4.2.1–4.2.3). Following this, Section 4.3 highlights identified trends and patterns. Then, we conduct the verification (Section 4.4). The chapter ends with a summary of the identified patterns (Section 4.5).
4.1. Topic analysis
The topic analysis clustered all emails classified as phishing emails in 22 topics, then further merged to 17 clusters to finally derive to 6 unique high-level topics (see Fig. 2 ). The LDA algorithm was used to identify the most satisfying number of topics. In order to assess the coherence of the formed topics in a technical way, we relied on metrics such as C_V metric, UMASS and normalized pointwise mutual information (NPMI) (Röder et al., 2015), with values 0.582, , and respectively. Röder et al. (2015) suggest that NPMI is in this regard the best topic coherence metric for optimization. Obtaining a score close to zero is a good result, but should be seen in the context of the data source. To determine the ‘goodness’ of topic independence, we relied on the visual inspection of 50 randomly selected emails from each topic/cluster by maintaining the balance between the efforts and the number of emails in each topic. If there was an explanatory pattern among the emails, e.g., the majority concerns Nigerian prince scams, we accepted a topic cluster as reasonably coherent. Even though the topic clusters were initially perceived as sufficiently coherent, certain topics were so closely related that they could be merged together and in a following step associated with a more general topic as seen in Fig. 2. The process of merging these topics was carried out in two steps: (i) from 22 to 17 to find sufficiently distinct topics, and (ii) from 17 to 6 to group more refined topics in a more general way. The colored numbers in Fig. 2 refer to the size of the topic in relation to the cumulative size of all 6 topics. The blue is based on the phishing dataset and the red one on the reduced dataset where similar emails have been removed (see Fig. 1).
Fig. 2.
Overview of topics and how they were merged. Advertisement: COVID-19 related emails that advertise various types of products to the recipient. News: COVID-19 related emails containing news to the reader on all sorts of topics. Information: COVID-19 related emails with the goal to inform the reader about various business topics/situations/regulations etc. Government: COVID-19 related emails that concern political or governmental affairs. Medical: COVID-19 related emails that are focused on health or healthcare in a broader sense. Other: COVID-19 related emails to which no general topic was found.
Figure 3 shows the size of the each of these clusters over time, based on the same dataset as the topic model. It becomes clear that the number of COVID-19 related phishing email is the highest at the beginning of the pandemic in the Netherlands. Especially emails with health related topics (medical) show a high increase decrease during this period. This might indicate that phishers were particularly framing emails around medical services or goods at the beginning of the pandemic when those products where in high demand. Figure 3 shows an increase of the topics: 1) information, 2) news, 3) advertisement, and 4) medical, related to the two lock-downs in the Netherlands (start ‘intelligent lockdown’ 23rd March-31 May 2020, and ‘full lockdown’ 15th Dec 2020-23rd Jan 2021), while the : 5) government, and 6) other, remain relatively constant.
Fig. 3.
Size of abstract topic clusters per month, based on the phishing dataset.
4.2. Timeline overview
Figure 4 shows the number of COVID-19 phishing emails over the time period in which the phishing emails were collected. The figure shows some spikes on days or short periods in which the number of COVID-19 phishing emails are substantially higher than during other periods.
Fig. 4.
Timeline of filtered COVID-19 phishing emails.
An example of one of these spikes is at March 26th, 2020. Of all the emails received on that day, almost 80% have similar characteristics. These mails are sent from the same domain (‘unfortunatedeadly.icu’), contain a hidden pixel , and have similar
elements within the HTML body to arrange the formatting of the phishing email but with different unintelligible textual content. All these emails are trying to scare the reader into buying face masks. Other spikes, such as 6th of July 2020 and 14th of December, are also caused by phishing campaigns but with an alternative characteristic pattern and phishing scheme (i.e., fake news headlines with link to a store (6th), and selling home warranty protection plans (14th)). If we decompose the peaks in the analysis of Fig. 4 into the topic clusters forming the peak, we observed that those peaks are constructed primarily by one topic cluster as seen with the peak on March 26th. That suggests that the peaks are the result of a phishing campaign.
4.2.1. Timeline correlated COVID-19 events
We analyse correlations between events concerning measures or other events regarding the COVID-19 pandemic and the contents of phishing emails. The initial assumption was that some spikes and trends (in Fig. 4) would correlate with specific events. However, none of the spikes could be traced back to corona-related events using the approach explained in Section 3.3. It seems, therefore, that most large phishing campaigns are not event-related, last a few weeks, and the launching of new campaigns follow a steady pacing pattern resulting in continuous steady amount of COVID-19 phishing emails. All the spikes (in Fig. 4) could be explained by phishing campaigns that were sent out with a slightly altered content but in a similar structured format. We did find trend alternations associated with corona-related events.
-
•The number of phishing emails takes a sudden rise from the 12th of March. The rise is likely associated with national awareness and announcement of pandemic entrance and the consequential political decisions taken regarding pandemic control (Cucinotta and Vanelli, 2020).
-
-On the 12th March a press conference was held in The Netherlands in which the first nation-wide strict measures were announced (e.g., canceling events and closing higher education).
-
-On the 16th March followed a TV speech of the Dutch prime minister (Mark Rutte) in which he addressed the nation about the notion of the COVID-19 virus (last address to the Nation was in 1973 oil crisis).
-
-
-
•
From April 2020 the trend is slowly decreasing until August 2020 after which it stabilizes.
-
•
The low amount of COVID-19 related emails in the summer months could be caused by the ease of restrictions during that time (email is less read and people are less scared of COVID-19 hence fall for phishing schemes).
4.2.2. Date patterns
Working patterns of cyber-criminals are considered when dates, time (in hour), and/or number of emails seem to correlate.
As showcased in Fig. 5 , there is a growth pattern in COVID-19 phishing emails between 9:00 (strongest increase) and 17:00 (strongest decrease after the plateau), reflecting the “9:00 to 17:00” workweek, the common working pattern (before COVID-19) of many organizations (i.e., the start for Dutch organizations is generally at 8:30 and ends at 17:00 with a half an hour break at 12:30–13:00). Remarkable is the highest peak at 11:00 (a little after the second coffee break, on average starting 10:30 lasting some 10 to 15 min). The peak is followed by a ‘lunch time’ dip from 12:00–13:00-, followed by a plateau from 14:00 to 16:00. The second slight peak at 16:00 may be explained by phishers aiming for the ‘getting to home early to pick up my kid rush’. This way it may be easier to deceive people and secondly may provide more time for phishing to be discovered since the employee went home, leaving work tasks and taking off their work minds. The distribution on the weekend is more varied and coincides with the 11 h peak during working days. In addition, it shows that also on the weekend most emails are received during working hours. There are two hypotheses to explain the highlighted patterns between 9:00 and 17:00 of Fig. 5. On the one hand, it could be that criminals believe that by following a usual working day makes their phishing more effective, but the emails are sent automatically. On the other hand, it could reflect the working hours of criminals showing that they also follow a ‘9:00–17:00’ job and send emails during their working hours.
Fig. 5.
No. of COVID-19 phishing emails by hour of day (time zone AMS (UTC + 01:00)) during the week and weekend.
In line with Ramzan and Wüest (2007) and Lastdrager (2018), we notice a sharp drop of nearly 50% in received emails during the weekend as reflected in Fig. 6 .
Fig. 6.
COVID-19phishing emails by day of week.
4.2.3. Phishing domains
Figure 7 shows the number of phishing emails containing URLs with the domains listed in the legend. For this analysis, the 5 most occurring phishing domains are selected. The lifetime of the domains varies greatly. For example, all emails with phishing URLs from ‘kiolyduke.casa’ and ‘unfortunatedeadly.icu’ are received in a span of less than 4 h. Phishing campaigns lasting for several hours to a few days is in line with the findings of other researchers, such as McGrath and Gupta (2008); Moore and Clayton (2007); Oest et al. (2020). In contrast, emails containing phishing URLs from the ‘covidvirus.guru’ domain appeared over several months. The extended use of the domains ‘edmcn.cn’ and ‘app1.ftrans01.com’ is likely due to those being domains of content sharing providers.
Fig. 7.
Temporal overview of domain URLs used in COVID-19 related phishing emails.
4.3. Hidden text analysis
The motivation of identifying hidden text is to recognize new phishing schemes adopted by attackers during the pandemic.
The first phishing scheme was found consisting of the pattern that text was obfuscated by coloring text white on a white background as shown in Box 1 , making it invisible for the reader.
Example 1.
Hidden text.
For some of the hidden text found in the HTML files there is a logical explanation for its presence. For example, many emails make use of the ‘HTML Email Preheader Text’, which sets the content that appears as a small line of text after the subject line in an email inbox by inserting a hidden directly after the
element (Mailtrap, 2022).
Some emails contain the small font-size (usually 1 or 2 pixels, with the exceptional case of 0.001 px) as well as the white color trick. The hidden texts are either a collection of nonsense words (refer to Example 1) varying form a few words to a paragraph of text, or short texts taken from online sources, e.g., news websites such as BBC.com.
In many phishing emails, we observe the appearance of these small samples of non-sense text repeatedly within a single email. It is likely that these fragments are used to circumvent spam-filters by adding seemingly reliable data into the mail to disguise real phishing intentions.
The emails that adopted the hidden text are frequently about face mask offers (see Fig. 9(d)). Those emails are sent from different addresses and have different contents and subjects, which could indicate that those are created by different adversaries. However, there is no easily observable relationship that explains the coherence between the use of the phishing pattern (hidden text) and selling of masks.
Fig. 9.
Examples of different phishing emails and how they mention COVID-19. The references cited in this figure are [Andrew and Yeung (2020); ...Ferreira and Teles (2019); ...Lin et al. (2019); ...WHO (2020).]
Another remarkable observation is that a substantial part of the mails (although from different senders and relate to different subjects), make use of the trick to include an clickable image (via i.imgur.com, see Fig. 8 ) that forwards you to the intended malicious website, which is positioned at the bottom section of the mail and displays a fictive clickable link stating “Unsubscribe Here”. Variants display “Manage subscriptions” or “if you do not wish to continue receiving email newsletters click here”.
Fig. 8.
Example unsubscribe image hosted on i.imgur.com.
4.4. Verification
By inspecting the resulting dataset (dominant patterns removed), we observe that the trends discussed in Section 4.2.2 (emails are received mostly during working hours, large decrease of emails on the weekends) are also present, which supports assumption 1. For assumption 2, we examined whether there are any spikes appearing in the timeline of received phishing emails (similarly as in Section 4.2.1). We could identify multiple peaks, however, we could not find any major patterns or relations in these emails to any specific event, hence, conclude there is support for our second assumption. However, we did find emails sharing a characteristic. For example, we identified emails containing base64 encoded instructions, or emails with bit.ly links and news headlines. Table 1 lists all our findings in that regard.
4.5. Summary of patterns
We first provide an overview of different identified COVID-19 related patterns in Table 1. Then, we show how the results of 4.2.1, 4.2.2 and 4.3 were verified and checked for incompleteness.
The different identified ways in which COVID-19 related keywords have been used to frame phishing emails can be classified in three types. Fig. 9(a)–(d) presents these different ways.
The three different existing relations (See Table 2 ), show that criminals made use of the COVID-19 pandemic to persuade recipients into clicking a malicious link out of curiosity/need (Example 9(b) and (d)) or understanding for disruptions/errors (example 9(c)). Besides, criminals use such keywords to pass spam filters either intentionally (actively use COVID-19 related fragments of news articles etc.) or unintentionally since new articles during that time were often related to COVID-19.
Table 2.
Description of relation types of COVID-19 to phishing emails.
Relationship | Relationship description |
---|---|
Direct relation to COVID-19 | The email relates to COVID-19 directly as the main topic of the email (Fig. 9(b)). |
Indirect relation to COVID-19 | The email mentions something about COVID-19, however, the main topic is not directly related to it (Fig. 9(c)). |
Hidden relation to COVID-19 | The email shows no sign of a relation to COVID-19. However, the HTML code of the email contains text which is related to COVID-19 (Fig. 9(a), HTML content not shown). |
No relation | The email shows no relation to COVID-19. |
5. Discussion of research contributions
First, we revisit the research questions, and then summarize the contributing findings and cover the limitations and implications.
5.1. Revisiting research questions
Crime changes and adapts to new circumstances such as those resulting from the COVID-19 pandemic. Different studies already have highlighted these changes in crime such as computer misuse (Office for National Statistics UK, August, 2020) or fraud (InterStats, 2020). This study is concerned with phishing, and the question: which effects did COVID-19 had on patterns in phishing emails? The initial expectation was that phishing would increase and the criminals would try to exploit uncertainties around the virus and introduced measures in their phishing emails. This study shows that there was a high increase of COVID-19 related phishing emails after the first restrictions had been introduced in the Netherlands. It also shows that criminals did make use of COVID-19 related content in their phishing emails.
The findings in Sections 4.2 and 4.2.1 show that in the beginning of the pandemic in the Netherlands phishing emails increased in numbers and healthcare related content such as selling masks formed prominent topics. Research from (Aguirre and Lane, 2019) indicates that fraud occurs at the beginning of disasters, which may explain this high increase in the first two months.
In general, we identified three different ways COVID-19 related content has been used in phishing emails (see Section 4.5). First is direct use, which gives the impression of providing help such as applying for monetary help or access to goods protecting against the virus. The second way was to make use of the pandemic in a more passive way by mentioning the pandemic but the main topic is about something else. The third way of using content regarding COVID-19 in phishing emails was the use of text, e.g., parts of news articles, which was included in the HTML code of the email but not visible to the reader.
Regarding all our findings, it could be that only a very small number of criminals caused a large number of emails and thus our findings reflect the behaviour of a small number of criminals. We tried to assess this with our trend verification process but it still could be that this finding cannot be generalized. It is possible that the first three identified ways of how COVID-19 related content is embedded in phishing emails do not reflect the approach the criminals pursued. For example, it is possible that hidden content in emails is not about the Coronavirus on purpose but due to the increased number of news articles about this topic. This would still make this approach a relevant finding, however, it would be unrelated to the COVID-19 pandemic. The finding that some phishing emails contain hidden (invisible) text is a known deceptive technique used in phishing emails as mentioned by Bergholz et al. (2010b). However, the paper showed that COVID-19 related content is also used for this approach. Findings on working days of criminals are in line with research presented by Ramzan and Wüest (2007) and Lastdrager (2018).
Section 4.2.1 revealed the finding that the volume of phishing emails followed the development of the pandemic in the Netherlands, showing a high increase after the fist measures were introduced. However, we could not find emails which directly relate to specific COVID-19 events such as introduced counter measures and restrictions. One can argue that phishing emails offering financial support such as shown in Fig. 9(b), are related to the introduction of such relief funds and thus co-occur with COVID-19 related events. However, in this research the focus was to find out whether phishing emails referenced specific COVID-19 related restrictions, measures and developments shortly after they have been introduced or observed.
It is not possible to rule out that criminals did not reference specific events in any of their phishing emails. However, this research shows that this has not been done on a large scale. Researchers, for example Bitaab et al. (2020), also identified phishing emails impersonating a COVID-19 relief fund.
Table 1 lists identified patterns and if applicable references to researchers who identified similar patterns. The table highlights that phishing emails often contained adaptations of known patterns. For example, adding COVID-19 related text in white color (invisible) to an email is a pattern has been adapted but previously described more generally as hidden salting by Bergholz et al. (2008). In contrast, this study identified very few novel patterns, suggesting that attackers favor adaption over innovation for the vast majority of phishing emails. The rational choice theory on crime of Cornish and Clarke (2016) can explain this behaviour as it argues that decision-making during crime scripts are majorly cost dependent. More specifically, Kirton’s Adaption-Innovation Theory, supports this cost difference by showing that adaptions are generally associated with lesser resource investments, than innovation (Kirton, 1976). The adversarial can alter it schemes most cost effectively, by assessing the cost-risk (i.e., low risk - low reward is preferred over high risk - high reward) (Junger et al., 2020) of component alterations based on both the perspective-based view (Hunton, 2009) and the process-based view (Maymí et al., 2017). In the perspective-based view, the phishing scheme would be evaluated for adaption based on seven distinct components, from the globalized environment, criminal or illicit intent, to data objectives, to exploitation tactics, attack methods, networked technology, or evasion and concealment (Hunton, 2009). On the other hand, the process-based views reviews using a pre-known set procedures and techniques as alternative elements to alter schemes quickly. MITRE ATT&CK, which is based on the Cyber Kill Chain, is an example of such framework (Maymí et al., 2017).
5.2. Limitations
The study’s limitations are as follows: Firstly, there is no comparison to data (long) before the COVID-19 pandemic or after to detect differences in trends and patterns. Furthermore, the study relied on an unsupervised classification algorithm used to classify phishing emails, but such method has its imperfections (Virustotal inaccuracy Choo et al., 2022b). In addition, the type of emails in the dataset adds to the limitations of our study. The data does not include all emails sent to specific domains, but only those classified as spam by Tesorion and containing a COVID-19 related keyword (see Table A.3). As a result, we could not analyse phishing emails with characteristics and patterns that could circumvent Tesorion’s spam filter (proprietary, not known to the researchers). This may have affected our observation that phishing patterns were mostly adaptations of existing ones. Furthermore, the data is limited to Dutch firms, while observing slightly other COVID-19 restrictions in other continents, this might affect the generalization to social conditions (Ashby, 2020, Boman, Gallupe, 2020, Bullinger, Carr, Packham, 2020, Felson, Jiang, Xu, 2020, Hodgkinson, Andresen, 2020, Mohler et al., 2020). Another limitation is that data has been restricted by English emails. Finally, there is no comparison to non-COVID-19 related phishing emails.
5.3. Implications
There are a number of implications of this study:
-
•
The general rise of phishing and COVID-19 related phishing indicates that phishing is considered a lucrative business for adversaries and requires increased attention for policy makers to counteract preventive measurements, e.g., increased resource allocation, more or adapted awareness campaigns and altering phishing scheme detection in algorithms.
-
•
The technique of topic clustering helps to detect shifts in phishing schemes operated, which is useful information for awareness campaign designers to recognize what are the topics or schemes that need to be explained to the wider public to prevent victimization.
-
•
The confirmation of misuse of the chaos induced by COVID-19 for developing phishing schemes implies that we should be extra careful to expect shifts in crime, fraudulent and phishing patterns. This could lead to the thought to predict or relate to other disturbing societal changes and to prepare for such foreseen impactful changes.
6. Conclusion and future work
In this paper, we studied the surge and shift of phishing patterns during the COVID-19 pandemic. We observed a large increase of COVID-19 related phishing emails in the beginning of the pandemic in the Netherlands. Although we could relate COVID-19 content frequently to the schemes, we did not see a direct relation in phishing emails to specific events or measures against the spread of the virus. Additionally, we confirm existing knowledge on time patterns, such as that most phishing emails were received during working hours during the week.
The contributions of the research are in two-fold: i) methodology to identify topics in COVID-19 phishing emails, and ii) an analysis of phishing patterns, its adaptions and innovations. For the first part, we observed the following:
-
•
The LDA model worked more effective than the combination of Doc2Vec and k-means for our dataset as it allows to focus on more contextual information in comparison to the Doc2Vec model. Furthermore, it is not affected by unrealistic clustering outcomes resulting from k-means that has a characteristic of hard clustering. Further, it is important for such studies to consider various aspects of the dataset and analysis such as the number of emails, email lenght, email content, size of clusters, distance between clusters, and the choice of the model and its optimization.
-
•
The TF-IDF is effective in identifying irrelevant terms resulting in more coherent topics and a less complex model.
Regarding the analysis of phishing patterns we found that
-
•
The overwhelming presence of COVID-19 in people's lives, for example through lockdowns, contributed to an increased use of COVID-19 related content in phishing emails.
-
•
Offender schemes are modified to COVID-19 topics (e.g., face masks), but the modi operandi are adapted to its context (exaptation).
-
•
This adaptive behavior by offenders can be understood by Cornish’s rational choice theory on crime and Kirton’s Adaption-Innovation Theory.
This paper’s findings contribute to institutes who develop awareness campaigns or phishing detection systems. Furthermore, this work can be interesting to academicians who work on phishing patterns developments and are curious to do further study on the challenges/limitations of the research as highlighted in the future work.
Future work could concentrate on our data limitations of COVID-19 related phishing emails during the pandemic, missing data before and after the pandemic as well as data on ’normal’ phishing during the same time. Moreover, future work could focus on a larger scale comparative study that could reveal changes in the behavior of criminals or principles of persuasion used. Such study could include time frames before and after the pandemic as well a broader scope, such as including non COVID-19 related phishing emails and attachments, that could improve the insights on how and if criminals adapt their phishing schemes to the COVID-19 pandemic. Another aspect would be to enhance the data pre-processing and the classification methods. The pre-processing could be improved in terms of complexity and regarding the selection of words to exclude for the LDA model. The classification algorithm requires optimization to reduce false positives. To obtain insight on the notion of phishing emails it could be beneficial to perform a sentiment analysis. In addition, the analysis of phishing emails written in different languages, including Dutch, on the targeted Dutch domains could give insights on differences in the design of phishing emails in different languages.
CRediT authorship contribution statement
Raphael Hoheisel: Conceptualization, Data curation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing. Guido van Capelleveen: Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Writing – original draft, Writing – review & editing. Dipti K. Sarmah: Conceptualization, Funding acquisition, Project administration, Methodology, Writing – original draft, Writing – review & editing. Marianne Junger: Conceptualization, Supervision, Writing – original draft, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research has received funding from the University of Twente, BMS COVID-19 Fund. We thank Tesorion Technology B.V., and in particular Dr. Wouter de Vries, for providing the phishing email data.
Biographies
Raphael Hoheisel is a master’s student at the University of Twente with a specialization in cyber security. For his master’s thesis he worked together with a private security company to study ransomware cases in more detail. His research interest expands towards ransomware into the general area of cybercrime. You can reach him at r.e.hoheisel@utwente.nl.
Guido van Capelleveen is an assistant professor at the Department of Business Analytics, University of Amsterdam. He received his Ph.D. from the University of Twente on the topic of Industrial Symbiosis Recommender Systems. His research interests are in the area of data analytics and data science with applications to real world practice, currently focused on applications for sustainability. His work has been accepted in venues as Decision Support Systems, the International Journal of Accounting Information Systems, Environmental Modelling & Software, the Journal of Environmental Management, and Expert systems with applications, among others. You can reach Guido at g.c.vancapelleveen@uva.nl.
Dipti K. Sarmah is a Lecturer in the group Services and Cyber Security (SCS) at the University of Twente. During her Ph.D., she worked on developing a high-capacity and robust image steganography method. Her research interests are not only limited to the field of steganography, and cryptography, and its applications, but also the study and analysis of human behavior for Cyber security. Her research work is published in the Journal of Information Security and Applications, Information Sciences, etc. She is also the author of a book published in the Intelligent Systems Reference Library, Springer. Dipti can be reached at d.k.sarmah@utwente.nl.
Marianne Junger received the Ph.D. degree in law from the Free University of Amsterdam, Amsterdam, the Netherlands, in 1990. She is the Emeritus Professor of Cyber Security and Business Continuity with the University of Twente, Enschede, the Netherlands. Her research investigates the human factors of fraud and cybercrime. More specifically, she investigates victimization, disclosure, and privacy issues. She founded the Crime Science journal together with Pieter Hartel and was an Associate Editor for 6 years. Her research was sponsored by, among others, the Dutch Police, NWO, ZonMw (for health research), and the European Union.
Tesorion is a Dutch cybersecurity firm located in Enschede and Leusden providing managed cybersecurity services to 500 firms.
Further details regarding these domains have not been shared with us.
Emails about alerts that can be created via https://www.google.com/alerts
Appendix A. Full list of keywords
Table A.
Keywords used to filter the emails.
Email filter keywords | ||
---|---|---|
Covid-19 | 2019_ncov | Corona |
COVID-19 | COVID19 | COVID19 |
covid-19 | 2019nCoV | corona |
coronavirusupdates | COVID19 | NCOV19 |
SARS-CoV-2 | Pandemic | Coronapocalypse |
2019-nCoV | CDC | Wuhan |
Kungflu | N95 | epidemic |
Panic Shopping | stayhomechallenge | Chinese virus |
safeathome | stayathome | covididiot |
Coronavirus | COVID19 | coronavirus |
SARS-CoV-2 | outbreak | Wuhanlockdown |
mondmasker | NCOV2019 | lockdown |
covid | Panic Buying | Koronavirus |
Novel coronavirus | Wuhanvirus | coronaviruses |
Facemask | Coronavirus disease 2019 | PPE shortage |
mondkap | flatten the curve | SocialDistancing |
face mask | COVID 19 | coronavirusoutbreak |
Wuhancoronavirus | Corona virus |
Data availability
The data that has been used is confidential.
References
- Adebowale M., Lwin K., Sánchez E., Hossain M. Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Syst. Appl. 2019;115:300–313. doi: 10.1016/j.eswa.2018.07.067. [DOI] [Google Scholar]
- Afandi N.A., Hamid I.R.A. COVID-19 phishing detection based on hyperlink using -nearest neighbor (KNN) algorithm. Appl. Inf. Technol. Comput. Sci. 2021;2(2):287–301. [Google Scholar]; https://publisher.uthm.edu.my/periodicals/index.php/aitcs/article/view/2317
- Aguirre B., Lane D. Fraud in disaster: rethinking the phases. Int. J. Disaster Risk Reduct. 2019;39:101232. doi: 10.1016/j.ijdrr.2019.101232. [DOI] [Google Scholar]
- Akdemir N., Yenal S. How phishers exploit the coronavirus pandemic: acontent analysis of COVID-19 themed phishing emails. SAGE Open. 2021;11(3) doi: 10.1177/21582440211031879. [DOI] [Google Scholar]; 21582440211031879
- Akhtar, M., Kumar, A., Ghosal, D., Ekbal, A., Bhattacharyya, P., 2017. A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. pp. 540–546. 10.18653/v1/D17-1057
- Aleroud A., Zhou L. Phishing environments, techniques, and countermeasures: a survey. Comput. Secur. 2017;68:160–196. doi: 10.1016/j.cose.2017.04.006. [DOI] [Google Scholar]
- Alghamdi A. 2022 2nd International Conference on Computing and Information Technology (ICCIT) 2022. Cybersecurity threats to healthcare sectors during COVID-19; pp. 87–92. [DOI] [Google Scholar]
- Al-Qahtani A.F., Cresci S. The COVID-19 scamdemic: a survey of phishing attacks and their countermeasures during COVID-19. IET Inf. Secur. 2022;16:324–345. doi: 10.1049/ise2.12073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alsmadi I., Alhami I. Clustering and classification of email contents. J. King Saud Univ. - Comput. Inf. Sci. 2015;27(1):46–57. doi: 10.1016/j.jksuci.2014.03.014. [DOI] [Google Scholar]
- Alzubaidi A. Measuring the level of cyber-security awareness for cybercrime in Saudi Arabia. Heliyon, Natl. Lib. Med. 2021;7(1) doi: 10.1016/j.heliyon.2021.e06016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- APWG . Technical Report. APWG; 2020. Phishing Activity Trend Reports. 3rd Quarter 2020. [Google Scholar]; Accessed: 2020-11-26
- APWG . Technical Report. APWG; 2020. Trend Reports. 1st Quarter 2020 Plus COVID-19 Coverage. [Google Scholar]; Accessed: 2020-11-23
- APWG . Technical Report. APWG; 2022. Trend Reports. 1st Quarter 2022. [Google Scholar]; Accessed: 2022-06-30
- Ashby M.P. Initial evidence on the relationship between the coronavirus pandemic and crime in the United States. Crime Sci. 2020;9:1–16. doi: 10.1186/s40163-020-00117-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atkeson A. Report. National Bureau of Economic Research; 2020. What Will be the Economic Impact of COVID-19 in the US? Rough Estimates of Disease Scenarios. [Google Scholar]
- Basnet R.B., Sung A.H. International conference on information security and artificial intelligence (ISAI) Citeseer; 2010. Classifying phishing emails using confidence-weighted linear classifiers; pp. 108–112. [Google Scholar]
- Bergholz A., De Beer J., Glahn S., Moens M.-F., Paaß G., Strobel S. New filtering approaches for phishing email. J. Comput. Secur. 2010;18(1):7–35. [Google Scholar]
- Bergholz A., De Beer J., Glahn S., Moens M.-F., Paaß G., Strobel S. New filtering approaches for phishing email. J. Comput. Secur. 2010;18(1):7–35. [Google Scholar]
- Bergholz A., Paass G., Reichartz F., Strobel S., Moens M.-F., Witten B. CEAS. vol. 9. 2008. Detecting known and new salting tricks in unwanted emails. [Google Scholar]
- Bhardwaj A., Sapra V., Kumar A., Kumar N., Arthi S. Why is phishing still successful? Comput. Fraud Secur. 2020;2020(9):15–19. [Google Scholar]
- Bitaab M., Cho H., Oest A., Zhang P., Sun Z., Pourmohamad R., Kim D., Bao T., Wang R., Shoshitaishvili Y., Doupé A., Ahn G.-J. 2020 APWG Symposium on Electronic Crime Research (eCrime) 2020. Scam pandemic: how attackers exploit public fear through phishing; pp. 1–10. [DOI] [Google Scholar]
- Blancaflor E.B., Alfonso A.B., Banganay K., et al. Proceedings of the International Conference on Industrial Engineering and Operations Management. 2021. Let’s go phishing: a phishing awareness campaign using smishing, email phishing, and social media phishing tools. [Google Scholar]
- Blei D.M., Ng A.Y., Jordan M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003;3(Jan):993–1022. [Google Scholar]
- Boman J.H., Gallupe O. Has COVID-19 changed crime? Crime rates in the United States during the pandemic. Am. J. Crim. Justice. 2020;45(4):537–545. doi: 10.1007/s12103-020-09551-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Budiarto A., Rahutomo R., Putra H.N., Cenggoro T.W., Kacamarga M.F., Pardamean B. Unsupervised news topic modelling with Doc2Vec and spherical clustering. Procedia Comput. Sci. 2021;179:40–46. doi: 10.1016/j.procs.2020.12.007. [DOI] [Google Scholar]; 5th International Conference on Computer Science and Computational Intelligence 2020
- Buil-Gil D., Miró-Llinares F., Moneva A., Kemp S., Díaz-Castaño N. Cybercrime and shifts in opportunities during COVID-19: a preliminary analysis in the uk. Eur. Soc. 2020;0(0):1–13. doi: 10.1080/14616696.2020.1804973. [DOI] [Google Scholar]
- Bullinger L.R., Carr J.B., Packham A. Report. National Bureau of Economic Research; 2020. COVID-19 and Crime: Effects of Stay-at-Home Orders on Domestic Violence (Pre-Print) [Google Scholar]; https://www.nber.org/papers/w27667
- Andrew, S., Yeung, J., 2020. Masks can’t stop the coronavirus in the US, but hysteria has led to bulk-buying, price-gouging and serious fear for the future. Accessed: 2023-01-14. https://edition.cnn.com/2020/02/29/health/coronavirus-mask-hysteria-us-trnd/index.html.
- Chawki M. In: Intelligent Computing. Arai K., editor. Springer International Publishing; Cham: 2021. Cybercrime in the context of COVID-19; pp. 986–1002. [DOI] [Google Scholar]
- Chen E., Lerman K., Ferrara E. Tracking social media discourse about the COVID-19pandemic: development of a public coronavirus twitter data set. JMIR Public Health Surveill. 2020;6(2):e19273. doi: 10.2196/19273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cats, O., Hoogendoorn, S., 2020. Accessed: 2023-02-27. https://www.tudelft.nl/en/covid/exit-strategies/the-role-of-and-impact-on-mobility-on-the-course-of-the-virus/.
- Choo, E., Nabeel, M., De Silva, R., Yu, T., Khalil, I., 2022a. A large scale study and classification of virustotal reports on phishing and malware urls. 10.48550/ARXIV.2205.13155
- Choo, E., Nabeel, M., De Silva, R., Yu, T., Khalil, I., 2022b. A large scale study and classification of virustotal reports on phishing and malware urls. arXiv preprint arXiv:2205.13155
- Cialdini, R. B., Sagarin, B. J., 2005. Principles of interpersonal influence.
- Cinelli M., Quattrociocchi W., Galeazzi A., Valensise C.M., Brugnoli E., Schmidt A.L., Zola P., Zollo F., Scala A. The COVID-19 social media infodemic. Sci. Rep. 2020;10(1):16598. doi: 10.1038/s41598-020-73510-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CNBC, 2020. Cybercrime ramps up amid coronavirus chaos, costing companies billions. Accessed: 2020-11-23, https://www.cnbc.com/2020/07/29/cybercrime-ramps-up-amid-coronavirus-chaos-costing-companies-billions.html.
- Cornish D.B., Clarke R.V. Environmental Criminology and Crime analysis. Routledge; 2016. The rational choice perspective; pp. 48–80. [Google Scholar]
- Crummy, 2021. Beautiful soup. Accessed: 2021-12-15, https://www.crummy.com/software/BeautifulSoup/.
- Cucinotta D., Vanelli M. Who declares COVID-19 a pandemic. Acta Bio Medica. 2020;91(1):157. doi: 10.23750/abm.v91i1.9397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Haas M., Faber R., Hamersma M. How COVID-19 and the Dutch ‘intelligent lockdown’ change activities, work and travel behaviour: evidence from longitudinal data in the netherlands. Transp. Res. Interdiscip. Perspect. 2020;6(100150) doi: 10.1016/j.trip.2020.100150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drury V., Lux L., Meyer U. Proceedings of the 17th International Conference on Availability, Reliability and Security. Association for Computing Machinery; New York, NY, USA: 2022. Dating phish: An analysis of the life cycles of phishing attacks and campaigns. [DOI] [Google Scholar]
- Europol . Technical Report. Europol; 2020. Pandemic Profiteering how Criminals Exploit the COVID-19 Crisis. [Google Scholar]; Accessed: 2020-11-23
- Felson M., Jiang S., Xu Y. Routine activity effects of the COVID-19 pandemic on burglary in detroit, March, 2020. Crime Sci. 2020;9(1):1–7. doi: 10.1186/s40163-020-00120-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferreira A., Teles S. Persuasion: how phishing emails can influence users and bypass security measures. Int. J. Human-Computer Stud. 2019;125:19–31. doi: 10.1016/j.ijhcs.2018.12.004. [DOI] [Google Scholar]
- Fette I., Sadeh N., Tomasic A. Proceedings of the 16th International Conference on World Wide Web. Association for Computing Machinery; New York, NY, USA: 2007. Learning to detect phishing emails; pp. 649–656. [DOI] [Google Scholar]
- Fraudehelpdesk, 2023. About fraud help desk. Accessed: 2023-14-01, https://www.fraudehelpdesk.nl/fraudhelpdesk-the-dutch-national-anti-fraud-hotline/.
- Furnell S., Emm D., Papadaki M. The challenge of measuring cyber-dependent crimes. Comput. Fraud Secur. 2015;2015(10):5–12. doi: 10.1016/S1361-3723(15)30093-2. [DOI] [Google Scholar]
- Gafni R., Pavel T. Cyberattacks against the health-care sectors during the COVID-19 pandemic. Inf. Comput. Secur. 2021;30(1):137–150. doi: 10.1108/ICS-05-2021-0059. [DOI] [Google Scholar]
- Gansterer W.N., Pölz D. European Conference on Information Retrieval. Springer; 2009. E-mail classification for phishing defense; pp. 449–460. [Google Scholar]
- Gibert K., Sànchez-Marrè M., Izquierdo J. A survey on pre-processing techniques: relevant issues in the context of environmental data mining. AI Commun. 2016;29(6):627–663. [Google Scholar]
- Goldkuhl G. Pragmatism vs. interpretivism in qualitative information systems research. Eur. J. Inf. Syst. 2012;21(2):135–146. [Google Scholar]
- Google, 2020. Compact language detector v3 (CLD3). Accessed: 2021-06-21, https://github.com/google/cld3.
- Groenendaal J., Helsloot I. Cyber resilience during the COVID-19 pandemic crisis: a case study. J. Conting. Crisis Manag. 2021;29(4):439–444. doi: 10.1111/1468-5973.12360. [DOI] [Google Scholar]
- Halevi, T., Memon, N., Nov, O., 2015. Spear-Phishing in the Wild: A Real-World Study of Personality, Phishing Self-Efficacy and Vulnerability to Spear-Phishing Attacks (January 2, 2015).
- Hamid I.R.A., Abawajy J. 2011, Algorithms and Architectures for Parallel Processing. ICA3PP 2011. Lecture Notes in Computer Science. vol. 7017. 2011. Hybrid feature selection for phishing email detection; pp. 266–275. [DOI] [Google Scholar]
- Hamid I.R.A., Abawajy J.H. 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE; 2013. Profiling phishing email based on clustering approach; pp. 628–635. [Google Scholar]
- Hardyns W., Schokkenbroek J.M., Schapansky E., Keygnaert I., Ponnet K., Vandeviver C. Technical Report. Ghent University; 2021. Patterns of Crime During the COVID-19 Pandemic in Belgium. [Google Scholar]; http://doi.org/10.31235/osf.io/r34x8
- Harris Z.S. Distributional structure. Word. 1954;10(2–3):146–162. [Google Scholar]
- Hodgkinson T., Andresen M.A. Show me a man or a woman alone and i’ll show you a saint: changes in the frequency of criminal incidents during the COVID-19pandemic. J. Crim. Justice. 2020;69:101706. doi: 10.1016/j.jcrimjus.2020.101706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hollnagel E. Resilience Engineering in Practice. CRC Press; 2017. Epilogue: rag–the resilience analysis grid; pp. 275–296. [Google Scholar]
- Holtfreter K., Reisig M.D., Pratt T.C. Low self-control, routine activities, and fraud victimization. Criminology. 2008;46(1):189–220. doi: 10.1111/j.1745-9125.2008.00101.x. [DOI] [Google Scholar]
- Horawalavithana S., De Silva R., Nabeel M., Elvitigala C., Wijesekara P., Iamnitchi A. In: Social, Cultural, and Behavioral Modeling. Thomson R., Hussain M.N., Dancy C., Pyke A., editors. Springer International Publishing; Cham: 2021. Malicious and low credibility urls on twitter during the astrazeneca COVID-19 vaccine development; pp. 3–12. [Google Scholar]
- Hu H., Peng P., Wang G. 2019 IEEE Symposium on Security and Privacy (SP) 2019. Characterizing pixel tracking through the lens of disposable email services; pp. 365–379. [DOI] [Google Scholar]
- Hunton P. The growing phenomenon of crime and the internet: acybercrime execution and analysis model. Comput. Law Secur. Rev. 2009;25(6):528–535. doi: 10.1016/j.clsr.2009.09.005. [DOI] [Google Scholar]
- Ispahany J., Islam R. 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops) 2021. Detecting malicious COVID-19 urls using machine learning techniques; pp. 718–723. [DOI] [Google Scholar]
- Jáñez-Martino F., Alaiz-Rodríguez R., González-Castro V., Fidalgo E., Alegre E. A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artif. Intell. 2022;56:1–29. [Google Scholar]
- Junger M., Wang V., Schlömer M. Fraud against businesses both online and offline: crime scripts, business characteristics, efforts, and benefits. Crime Sci. 2020;9(1):1–15. [Google Scholar]
- Kaliňák, V., 2021. Psychology of phishing attacks during crises: the case of COVID-19 pandemic.
- Karim A., Azam S., Shanmugam B., Kannoorpatti K. Efficient clustering of emails into spam and ham: the foundational study of a comprehensive unsupervised framework. IEEE Access. 2020;8:154759–154788. doi: 10.1109/ACCESS.2020.3017082. [DOI] [Google Scholar]
- Kawaoka R., Chiba D., Watanabe T., Akiyama M., Mori T. International Conference on Passive and Active Network Measurement. Springer; 2021. A first look at COVID-19 domain names: origin and implications; pp. 39–53. [Google Scholar]
- Kemp S., Buil-Gil D., Moneva A., Miró-Llinares F., Díaz-Castaño N. Empty streets, busy internet: a time-series analysis of cybercrime and fraud trends during COVID-19. J. Contemp. Crim. Justice. 2021;37(4):480–501. doi: 10.1177/10439862211027986. [DOI] [Google Scholar]
- Kennedy J.P., Rorie M., Benson M.L. COVID-19 frauds: an exploratory study of victimization during a global crisis. Criminol. Public Policy. 2021;20(3):493–543. doi: 10.1111/1745-9133.12554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennedy L.W., Forde D.R. Routine activities and crime: an analysis of victimization in canada. Criminology. 1990;28(1):137–152. [Google Scholar]
- Kirlappos I., Sasse M.A. Security education against phishing: a modest proposal for a major rethink. IEEE Secur. Privacy. 2011;10(2):24–32. [Google Scholar]
- Kirton M. Adaptors and innovators - description and measure. J. Appl. Psychol. 1976;61(5):622–629. doi: 10.1037/0021-9010.61.5.622. [DOI] [Google Scholar]
- Kousha, K., Thelwall, M., 2020. COVID-19 publications: database coverage, citations, readers, tweets, news, facebook walls, reddit posts. arXiv:2004.10400
- Kouzy R., Abi Jaoude J., Kraitem A., El Alam M.B., Karam B., Adib E., Zarka J., Traboulsi C., Akl E.W., Baddour K. Coronavirus goes viral: quantifying the COVID-19misinformation epidemic on twitter. Cureus. 2020;12(3):e7255. doi: 10.7759/cureus.7255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- InterStats, 2020. Analyse conjoncturelle des crimes et délits enregistrés par la police et la gendarmerie à la fin du mois d'août 2020. Paris, France: Service statistique ministériel de la sécurité intérieure. Retrieved from: https://www.interieur.gouv.fr/Interstats/Actualites/Interstats-Conjoncture-N-60-Septembre-2020
- Kumaran, N., Lugani, S., 2020. Protecting businesses against cyber threats during COVID-19 and beyond. Google Cloud. Accessed: 2023-02-27. https://cloud.google.com/blog/products/identity-security/protecting-against-cyber-threats-during-covid-19-and-beyond.
- Laan, J., 2021. The impact of the corona-pandemic on the business model of cybercrime. http://essay.utwente.nl/87830/.
- Lallie H.S., Shepherd L.A., Nurse J.R., Erola A., Epiphaniou G., Maple C., Bellekens X. Cyber security in the age of COVID-19: a timeline and analysis of cyber-crime and cyber-attacks during the pandemic. Comput. Secur. 2021;105:102248. doi: 10.1016/j.cose.2021.102248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lastdrager E. From Fishing to Phishing. University of Twente, Netherlands; 2018. [Google Scholar]
- Le, Q. V., Mikolov, T., 2014. Distributed representations of sentences and documents. CoRR abs/1405.4053. http://arxiv.org/abs/1405.4053.
- Legg P., Blackman T. 2019 International Conference on Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA) 2019. Tools and techniques for improving cyber situational awareness of targeted phishing attacks; pp. 1–4. [DOI] [Google Scholar]
- Levi M., Smith R.G. Technical Report. Australian Institute of Criminology; 2021. Fraud and its Relationship to Pandemics and Economic Crises: From Spanish flu to COVID-19. [Google Scholar]
- Lin, T., Capecci, D. E., Ellis, D. M., Rocha, H. A., Dommaraju, S., Oliveira, D. S., Ebner, N. C., 2019. Susceptibility to spear-phishing emails: effects of internet user demographics and email content 26(5). 10.1145/3336141 [DOI] [PMC free article] [PubMed]
- Liu C., Stamm S. Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit. 2007. Fighting unicode-obfuscated spam; pp. 45–59. [Google Scholar]
- Lloyd S. Least squares quantization in PCM. IEEE Trans. Inf. Theory. 1982;28(2):129–137. [Google Scholar]
- Luhn H.P. A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1957;1(4):309–317. doi: 10.1147/rd.14.0309. [DOI] [Google Scholar]
- Mabey, B., 2021. pyldavis 3.1. Accessed 2021-12-10. https://pypi.org/project/pyLDAvis/.
- Manning C.D., Raghavan P., Schütze H., et al. vol. 1. Cambridge University Press Cambridge; 2008. Introduction to Information Retrieval. [Google Scholar]
- Martin D., Wu H., Alsaid A. Hidden surveillance by web sites: web bugs in contemporary use. Commun. ACM. 2003;46(12):258–264. [Google Scholar]
- Mathieu E., Ritchie H., Rodés-Guirao L., Appel C., Giattino C., Hasell J., Macdonald B., Dattani S., Beltekian D., Ortiz-Ospina E., Roser M. Our World in Data. 2020. Coronavirus pandemic (COVID-19) [Google Scholar]; https://ourworldindata.org/coronavirus
- Mailtrap, 2022.   and html space challenges and tricks. Accessed 2022-01-07. https://mailtrap.io/blog/nbsp/.
- Matplotlib.org, 2022. Matplotlib - Visualization with Python. Accessed: 2022-06-30. https://matplotlib.org/.
- Maymí F., Bixler R., Jones R., Lathrop S. 2017 IEEE International Conference on Big Data (Big Data) IEEE; 2017. Towards a definition of cyberspace tactics, techniques and procedures; pp. 4674–4679. [Google Scholar]
- McGrath, D. K., Gupta, M., 2008. Behind phishing: an examination of phisher modi operandihttps://www.usenix.org/legacy/event/leet08/tech/full_papers/mcgrath/mcgrath_html/.
- McRae C.M., Vaughn R.B. 2007 40th Annual Hawaii International Conference on System Sciences (HICSS’07) 2007. Phighting the phisher: using web bugs and honeytokens to investigate the source of phishing attacks; p. 270c. [DOI] [Google Scholar]
- Mimecast, 2020. Coronavirus phishing attacks speed up across the globe | mimecast blog. Accessed: 2020-08-10. https://www.mimecast.com/blog/coronavirus-phishing-attacks-speed-up-globally/.
- Mohler G., Bertozzi A.L., Carter J., Short M.B., Sledge D., Tita G.E., Uchida C.D., Brantingham P.J. Impact of social distancing during COVID-19 pandemic on crime in los angeles and Indianapolis. J. Crim. Just. 2020;68:101692. doi: 10.1016/j.jcrimjus.2020.101692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore T., Clayton R. Proceedings of the Anti-Phishing Working Groups 2nd Annual ECrime Researchers Summit. Association for Computing Machinery; New York, NY, USA: 2007. Examining the impact of website take-down on phishing; pp. 1–13. [DOI] [Google Scholar]
- Nicola M., Alsafi Z., Sohrabi C., Kerwan A., Al-Jabir A., Iosifidis C., Agha M., Agha R. The socio-economic implications of the coronavirus pandemic (COVID-19): a review. Int. J. Surg. 2020;78:185. doi: 10.1016/j.ijsu.2020.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niu W., Zhang X., Yang G., Ma Z., Zhuo Z. 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC) IEEE; 2017. Phishing emails detection using CS-SVM; pp. 1054–1059. [Google Scholar]
- Ministerie van Volksgezondheid W. e. S., 2023. Confirmed cases | Coronavirus Dashboard | Government.nl. Accessed: 2022-03-27. https://coronadashboard.government.nl.
- NLTK, 2021. Natural language toolkit (NLTK). Accessed: 2021-12-15, https://github.com/nltk/nltk.
- Oest A., Zhang P., Wardman B., Nunes E., Burgis J., Zand A., Thomas K., Doupé A., Ahn G.-J. Proceedings of the 29th USENIX Conference on Security Symposium. USENIX Association; USA: 2020. Sunrise to sunset: analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale. [Google Scholar]
- Office for National Statistics UK, August, 2020. https://www.gov.uk/government/statistics/coronavirus-and-crime-in-england-and-wales-august-2020.
- Patgiri R., Katari H., Kumar R., Sharma D. International Conference on Distributed Computing and Internet Technology. Springer; 2019. Empirical study on malicious url detection using machine learning; pp. 380–388. [Google Scholar]
- Patil D.R., Patil J.B. Malicious urls detection using decision tree classifiers and majority voting technique. Cybern. Inf. Technol. 2018;18(1):11–29. [Google Scholar]
- Petelka J., Zou Y., Schaub F. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019. Put your warning where your link is: Improving and evaluating email phishing warnings; pp. 1–15. [Google Scholar]
- Pletinckx S., Jansen G.H., Brussen A., van Wegberg R. 2021 12th International Conference on Information and Communication Systems (ICICS) 2021. Cash for the register? Capturing rationales of early COVID-19 domain registrations at internet-scale; pp. 41–48. [DOI] [Google Scholar]
- Rameem Zahra S., Ahsan Chishti M., Iqbal Baba A., Wu F. Detecting COVID-19 chaos driven phishing/malicious url attacks by a fuzzy logic and data mining based intelligence system. Egyptian Inform. J. 2021 doi: 10.1016/j.eij.2021.12.003. [DOI] [Google Scholar]
- Ramzan Z., Wüest C. CEAS. Citeseer; 2007. Phishing attacks: analyzing trends in 2006. [Google Scholar]
- Rechtsraak, D., 2022. Uitspraak, afdeling strafrecht. Accessed: 2023-14-01, https://uitspraken.rechtspraak.nl/#!/details?id=ECLI:NL:GHARL:2022:10845.
- Řehuřek, R., 2021. Gensim: topic modelling for humans. Accessed: 2021-12-15, https://radimrehurek.com/gensim/models/ldamodel.html.
- Röder M., Both A., Hinneburg A. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery; New York, NY, USA: 2015. Exploring the space of topic coherence measures; pp. 399–408. [DOI] [Google Scholar]
- Sahingoz O.K., Buber E., Demir O., Diri B. Machine learning based phishing detection from urls. Expert Syst. Appl. 2019;117:345–357. [Google Scholar]
- Sarno D.M., Black J., Harris K., Harris M., Koontz P., Paradise E. Proceedings of the Human Factors and Ergonomics Society Annual Meeting. vol. 66. SAGE Publications Sage CA: Los Angeles, CA; 2022. Fall for one, fall for all: understanding deception detection in phishing emails, scam texts messages, and fake news headlines; p. 1115. [Google Scholar]
- Sharevski F., Devine A., Pieroni E., Jachim P. Proceedings of the 2022 European Symposium on Usable Security. Association for Computing Machinery; New York, NY, USA: 2022. Phishing with malicious QR codes; pp. 160–171. [DOI] [Google Scholar]
- Sherman L.W., Gartin P.R., Buerger M.E. Hot spots of predatory crime: routine activities and the criminology of place*. Criminology. 1989;27(1):27–56. [Google Scholar]
- Sood A.K., Talluri S., Nagal A., SL R.R., Chaturvedi R. The COVID-19 threat landscape. Comput. Fraud Secur. 2021;2021(9):10–15. [Google Scholar]
- Spark Jones K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972;28(1):11–21. doi: 10.1108/eb026526. [DOI] [Google Scholar]
- Tilley N., Sidebottom A. Wiley; Oxford, UK: 2015. Routine Activities and Opportunity Theory; pp. 331–348. [Google Scholar]; book section 21
- Toolan F., Carthy J. 2010 eCrime Researchers Summit, 2010. vol. 7017. 2010. Feature selection for spam and phishing detection; pp. 1–12. [DOI] [Google Scholar]
- Tsow A., Jakobsson M. Indiana University; 2007. Deceit and Deception: A Large User Study of Phishing; p. 2007. [Google Scholar]; Retrieved September 9
- Van Der Heijden A., Allodi L. 28th USENIX Security Symposium (USENIX Security 19) 2019. Cognitive triaging of phishing attacks; pp. 1309–1326. [Google Scholar]
- van Kesteren J., van Dijk J., Mayhew P. The international crime victims surveys: aretrospective. Int. Rev. Vict. 2013;20(1):49–69. [Google Scholar]
- Venkatesha S., Reddy K.R., Chandavarkar B. Social engineering attacks during the COVID-19pandemic. SN Comput. Sci. 2021;2(2):1–9. doi: 10.1007/s42979-020-00443-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Verma R., Das A. Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics. 2017. What’s in a url: fast feature extraction and malicious url detection; pp. 55–63. [Google Scholar]
- VirusTotal, 2020. Virustotal api: getting started with v2. Accessed: 2020-11-23, https://developers.virustotal.com/reference/overview.
- VirusTotal, 2022. How it works. Accessed: 2022-01-08, https://support.virustotal.com/hc/en-us/articles/115002126889-How-it-works.
- Walker P., Whittaker C., Watson O., Baguelin M., Ainslie K., Bhatia S., Bhatt S., Boonyasiri A., Boyd O., Cattarino L. Journal Article. Imperial College London; 2020. Report 12: The Global Impact of COVID-19 and Strategies for Mitigation and Suppression. [DOI] [Google Scholar]
- Wang G., Kwok S.W.H. 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) 2021. Using -means clustering method with Doc2Vec to understand the twitter users’ opinions on COVID-19 vaccination; pp. 1–4. [DOI] [Google Scholar]
- Xia P., Nabeel M., Khalil I., Wang H., Yu T. Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy. Association for Computing Machinery; New York, NY, USA: 2021. Identifying and characterizing COVID-19 themed malicious domain campaigns; pp. 209–220. [DOI] [Google Scholar]
- Yearwood J., Mammadov M., Webb D. Profiling phishing activity based on hyperlinks extracted from phishing emails. Soc. Netw. Anal. Min. 2012;2(1):5–16. [Google Scholar]
- Zubair M., Asif Iqbal M., Shil A., Haque E., Moshiul Hoque M., Sarker I.H. In: Hybrid Intelligent Systems. Abraham A., Hanne T., Castillo O., Gandhi N., Nogueira Rios T., Hong T.-P., editors. Springer International Publishing; Cham: 2021. An efficient -means clustering algorithm for analysing COVID-19; pp. 422–432. [Google Scholar]
- WHO, 2020. Shortage of personal protective equipment endangering health workers worldwide. Accessed: 2023-01-14. https://www.who.int/news/item/03-03-2020-shortage-of-personal-protective-equipment-endangering-health-workers-worldwide.
- World Health Organization, 2022. Timeline: WHO’s COVID-19 response. Accessed: 2022-03-07. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/interactive-timeline.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that has been used is confidential.