Table 10.
Advantages and limitations of intra-vehicular environment security datasets.
Dataset | Advantages | Limitations |
---|---|---|
Car-Hacking | The attack captures are very long and contain a large number of instances per attack. This dataset seems to be the most widely used in the CAN IDS research community. |
All the attack captures contain a significant artifact of data collection that may pose a problem for researchers using this data. Ambient and attack data are in different formats. |
OTIDS | It is the only open dataset that includes remote frames and responses. The fuzzing attack is unique in being the sole example of this kind of fuzzing attack in an open dataset. |
The documentation on the injection message intervals needs to be clarified. The “impersonation attack” is not a real masquerade attack because the legitimate node’s message transmission is suspended. |
Survival | It contains real attacks on multiple vehicles. This dataset provides evidence for the importance of the duration during which the bus is occupied by a message. |
All of the attacks are basic and can be detected with a very simple frequency-based detector. only 60–90 s of data are provided per vehicle, which is likely not sufficient for robust training. the ambient data and attack data are in differently formatted CSVs, which is undesirable. |
SynCAN | This is the only dataset (other than ours) that contains attacks targeting a single signal. This dataset contains the most nuanced masquerade attacks currently available. |
Synthetic data are clearly an imperfect proxy for real data. Simulated attacks are inherently problematic since their effect on a vehicle cannot be verified. |
TU/e v2 | This dataset includes the only diagnostic protocol attack publicly available and the only suspension attack (simulated) in real CAN data. The same set of attacks is available for testing on multiple vehicles/CANs. |
Attack labels are in an unstructured text file, so there is no way of programmatically reading what/when packets were injected. Most of the attacks are unrealistic. |
ORNL | The published data have been obfuscated in a way that maintains the anonymity of the vehicle while preserving all important aspects of the data for an IDS. | Unlabeled data. |
CrySyS | This dataset can be easily extended to add new attacks. This is the only dataset furnished with descriptions of the driver’s actions during ambient captures, which is highly valuable for training and testing an IDS. |
Attacks are added in post-processing, there is no guarantee that these attacks would actually affect vehicle function. Can-Log-Infector’s implementation can cause many problems. |
SIMPLE | The dataset handles both periodic and aperiodic messages. | Documentation needs to be clarified. |
Bi | Provides a robust foundation for IDS under challenging conditions and comprehensive evaluation of detection capabilities. | Unavailable dataset (Private and not accessible) |