Table 18.
Research questions and mapping to results.
| Research Questions | Direct mapping to results. |
|---|---|
| RQ1: How can hybrid NLP-ML techniques reliably extract and classify IOCs and attack patterns from unstructured CTI reports, overcoming limitations of static indicators? |
BERT-spaCy-regex hybrid extracts IOCs (IPs: 95%, domains: 92%, malware: 85%) from unstructured data (Table 10; Fig. 6). Classifies patterns with 95.7% F1-score, reducing false positives by 22% via kernel smoothing. Validated on CIC-IDS2017/UNSW-NB15 (CRI = 0.999 for generalization). |
| RQ2: How can blockchain-integrated ledgers ensure tamper-proof, traceable CTI sharing across organizations while addressing trust, privacy, and interoperability challenges? |
Lightweight blockchain ledger ensures integrity across 20 blocks (Table 6), with tamper detection at block 1. Enables secure dissemination (RQ2 focus), reducing single-point failures and insider threats. 55% latency improvement supports high-throughput sharing in finance/healthcare/IoT. |
| RQ3: How can confidence-weighted ensembles and adaptive ML mitigate class imbalance in cybersecurity datasets to improve threat prediction accuracy and generalization? |
Confidence-weighted ensemble (w_i = confidence(m_i)/∑confidence) handles imbalance, yielding 3% F1 gain and 18% accuracy rise (75% → 93%, Table 1). t-tests (t = 3.45–4.12, p < 0.001) and Cohen’s d (0.76–1.12) confirm robustness (Table 11). CRI = 0.999 outperforms baselines (LSTM: 0.998), enabling prediction in noisy IoT/financial streams. |