Table 7. Comparison with datasets available in literature.
| Dataset | Number of functions | Evaluated task | Open source |
|---|---|---|---|
| BinKit Kim et al. (2020) | 75,230,573 | ① | Yes |
| In nomine function Artuso et al. (2021) | 8,861,407 | ④ | Yes |
| Diff Liu et al. (2018) | 4,979,586 | ① | Yes |
| BinBench | 4,408,191 | ①, ②, ③, ④, ⑤ | Yes |
| SAFE Massarelli et al. (2019b) | 548,133 ①, 581,640 ②, 1,587,648 ③ | ①, ②, ③ | Yes |
| Graph embedding NNs Massarelli et al. (2019a) | 95,535 ①, 2,040,246 ③ | ①, ③ | Yes |
| Toolchain provenance Rosenblum, Miller & Zhu (2011) | 955,000 | ③ | No |
| Asm2Vec Ding, Fung & Charland (2019) | 139,936 | ② | No |
| Gemini Xu et al. (2017) | 129,365 | ① | No |
| Eklavya Chua et al. (2017) | 119,352 | ⑤ | Yes |
| NERO David, Alon & Yahav (2020) | 67,246 | ④ | Yes |
| Debin He et al. (2018) | 238 | ④ | No |
Note:
Evaluated tasks: ①, binary similarity; ②, function search; ③, compiler provenance; ④, function naming; ⑤, signature recovery.