|
Algorithm 1: ViT-DCNN (Vision Transformer with Deformable Convolution) for Lung and Colon Cancer Classification |
| 1: Input: D = {(Xi, Yi)}, α, T, B, θViT, θDConv, N |
| 2: Initialize: θViT, θDConv
|
| 3: for epoch = 1 to T do |
| 4: for batch = 1 to
do |
| 5: Extract mini-batch:
|
| 6:
|
| 7: Apply Data Augmentation: |
| 8:
|
| 9: Vision Transformer (ViT) Forward Pass: |
| 10: Patch Embedding: |
| 11:
|
| 12: Positional Encoding: |
| 13: Zi = Pi + PEi
|
| 14: Multi-Head Self-Attention: |
| 15:
|
| 16: Feed-Forward Network: |
| 17: FFN(Z) = max(0, ZW1 + b1) W2 + b2
|
| 18: Deformable Convolution Forward Pass: |
| 19: Deformable Convolution: |
| 20:
|
| 21: Offset Learning: |
| 22:
|
| 23: Spatial Attention: |
| 24:
|
| 25: Hierarchical Feature Fusion (HFF): |
| 26: Concatenate Vision Transformer and Deformable CNN Features: |
| 27: Fconcat = concat(FViT, FDConv) |
| 28: Squeeze-and-Excitation (SE) Block: |
| 29:
|
| 30: Refined Feature Map: |
| 31:
|
| 32: Prediction and Softmax Activation: |
| 33: Global Average Pooling: |
| 34:
|
| 35: Softmax Layer: |
| 36:
|
| 37: Predicted Class: |
| 38:
|
| 39: Compute Class: |
| 40: Cross-Entropy Loss: |
| 41:
|
| 42: Gradient Computation: |
| 43:
|
| 44: Parameter Update (Using AdamW optimizer): |
| 45: Update the Vision Transformer parameters: |
| 46:
|
| 47: Update the Deformable CNN parameters: |
| 48:
|
| 49: end for |
| 50: end for |
| 51: Output: |
| 52: Trained ViT-Deformable CNN model with updated parameters θViT and θDConv
|