Skip to main content
. 2022 Mar 4;22(5):2017. doi: 10.3390/s22052017

Table 4.

List of various cyber-attack datasets.

Reference Year Dataset Used Dataset Size Format Details about the Dataset/Brief Description
[104] 2016 CAIDA DDoS 2007 and MIT DARPA dataset 5.3 GB pcap (tcpdump) format
  • -

    An hour of anonymized traffic records from a DDoS attack on 4 August 2007 is included in this dataset.

  • -

    This form of denial-of-service attack tries to prevent users from accessing the server by using all of the computational resources and bandwidth on the network.

[105] 2015 Botnet [Zeus (Snort), Zeus (NETRESEC),
Zeus-2 (NIMS), Conficker
(CAIDA) and ISOT-Uvic]
14 GB packets packet
  • -

    It is a network-based dataset. It basically works on diverse networks and intercepts emulated traffic (Normal and attack traffic).

  • -

    The data set is well labelled but not balanced.

[106] 2009 NSL-KDD 4 GB of compressed (approx.)/150k points tcpdump data
  • -

    The train set does not contain any redundant records nor any duplicate records.

  • -

    A limited number of datasets are taken into consideration for training and testing.

[107] 2011 ISOT 11 GB packets packet
  • -

    The ISOT dataset was compiled from two different datasets comprising malicious traffic from the Honeynet project’s French chapter, which involved the Storm and Waledac botnets, respectively.

[54] 2016 UNSW-NB-15 100 GB CSV files
  • -

    The Australian Centre for Cyber Security’s (ACCS) Cyber Range Lab used an IXIA PerfectStorm programme to construct a combination of realistic modern regular activities and synthetic contemporary attack behaviours from network data.

[108] 2017 Unified Host and
Network
150 GB flows
(compressed)
bi. flows, logs
  • -

    A subset of network and computer (host) events makes up the Unified Host and Network Dataset, events were collected over a 90-day period from the Los Alamos National Laboratory enterprise network.

[109] 2011 Yahoo Password Frequency Corpus 130.64 kB (compressed) txt files
  • -

    The dataset contains sanitised password frequency lists from Yahoo, which were obtained in May 2011.

[110] 2014 500K HTTP Headers 75 MB CSV files
  • -

    Crawled the top 500K sites (as ranked by Alexa).

[111] 2014 The Drebin Dataset 6 MB (approx.) txt log, CSV and XML files
  • -

    The goal of the dataset is to promote Android malware research and allow for comparisons of different detection methods.

  • -

    There are 5560 applications in the dataset, representing 179 separate malware families. Between August 2010 and October 2012, the samples were collected.

[112] 2008 Common Crawl 320 TiB WARC and ARC format
  • -

    Since 2008, the Common Crawl corpus has accumulated petabytes of data.

  • -

    Raw web page data, extracted metadata, and text extractions are all included, composed of over 50 billion web pages.