|
Algorithm 1 Two-Stream Network for Activity Recognition with Fusion |
-
Require:
Data:
-
1:
Skeleton data (2D pose coordinates, angles, distances)
-
2:
RGB video
-
3:
procedure TwoStreamActivityRecognition(skeletonData, rgbVideo)(
)
-
4:
Stages
-
5:
Stage 1: Skeleton Stream:
-
6:
1. Uniformly sample 10 frames from each video of the dataset.
-
7:
2. For each frame:
-
8:
a. Extract 2D pose coordinates.
-
9:
b. Apply Normalization on the keypoints.
-
10:
c. Calculate joint angles and distances.
-
11:
3. Apply Feature Selection using FFS.
-
12:
4. Store preprocessed data as Xs.
-
13:
5. Feed Xs into Long Short-Term Memory (LSTM) network.
-
14:
6. Output: Probabilities for each activity class: Prob_1
-
15:
Stage 2: RGB Stream:
-
16:
1. Feed RGB video into a 2 + 1D Convolutional Neural Network (CNN).
-
17:
- Utilize 2D spatial convolutions for feature extraction.
-
18:
- Utilize 1D temporal convolution for capturing temporal dependencies.
-
19:
2. Output: Probabilities for each activity class: Prob_2
-
20:
Stage 3: Fusion:
-
21:
- Perform fusion using chosen methods (e.g., addition, multiplication):
-
22:
a. Prob_Fusion1 = Prob_1 + Prob_2 (Addition)
-
23:
b. Prob_Fusion2 = Prob_1 × Prob_2 (Multiplication)
-
24:
Stage 4: Decision:
-
25:
- Choose the prediction with higher confidence:
-
26:
a. Pred_Fusion1 = max(Prob_Fusion1)
-
27:
b. Pred_Fusion2 = max(Prob_Fusion2)
-
28:
end procedure
|