Table 20.
IOC category extraction performance and proposed remediation Strategies.
| IOC Category | Accuracy (from Table 10) | Primary Extraction Method | Key Challenge | Proposed Remediation | Expected Impact |
|---|---|---|---|---|---|
| IP Addresses | 95% | Regex primary | Fixed format variations | Current pipeline sufficient | Maintain high performance |
| Domains/URLs | 92% | Regex + spaCy ruler | Minor typos/subdomain variations | Current pipeline sufficient | Maintain high performance |
| Hashes (MD5/SHA) | ~ 90% (implied) | Regex primary | Format consistency | Current pipeline sufficient | Maintain high performance |
| Malware Names | 85% | BERT contextual + spaCy | Polymorphic variants and aliases | Extended BERT fine-tuning on variant-rich datasets + malware family ontology mapping | + 8–12% accuracy estimated |
| Attack Techniques | 78% | BERT primary | Ambiguous narrative descriptions | MITRE ATT&CK ontology integration (entity ruler augmentation + contextual linking) + hybrid symbolic–neural disambiguation | + 10–15% accuracy estimated |