|
Algorithm 2: Spatial-Semantic Decoder (SSD) |
-
1:
Input: Test keyframe , Reference set , Detection threshold , Similarity threshold θ
-
2:
Output: Educational scene graph G
-
3:
// Level 1: Open-World Object Proposal Model
-
4:
-
5:
// Initialize reference embed
-
6:
dings
-
7:
for each do
-
8:
-
9:
Store in reference database
-
10:
end for
-
11:
Initialize B ← ∅ // Set of detected bounding boxes
-
12:
for each patch embedding do
-
13:
for each do
-
14:
Compute similarity ←
-
15:
if similarity > then
-
16:
-
17:
-
18:
break
-
19:
end if
-
20:
end for
-
21:
end for
-
22:
// Step 2: Pixel-Accurate Mask Refiner
-
23:
Initialize M ← ∅
-
24:
for each do
-
25:
-
26:
-
27:
-
28:
end for
-
29:
// Step 3: Entropy reduction, verification, Semantic graph formation, and annotation
-
30:
Initialize G ← ConstructSceneGraph(M) using Verified region labels
-
31:
for each do
-
32:
-
33:
-
34:
-
35:
-
36:
for each do
-
37:
-
38:
if then
-
39:
-
40:
-
41:
end if
-
42:
end for
-
43:
if then
-
44:
-
45:
else
-
46:
-
47:
end if
-
48:
end for
-
49:
return G
|