Skip to main content
. 2025 Dec 19;28(1):3. doi: 10.3390/e28010003
Algorithm 2: Spatial-Semantic Decoder (SSD)
  • 1:

    Input: Test keyframe It, Reference set R=region 1,label 1,,region k,label k, Detection threshold τdetection, Similarity threshold θ

  • 2:

    Output: Educational scene graph G

  • 3:

    // Level 1: Open-World Object Proposal Model

  • 4:

    EtEncodeImageIt

  • 5:

    // Initialize reference embed

  • 6:

    dings

  • 7:

    for each regionr,label rR do

  • 8:

          ErEncodeImageregionr

  • 9:

          Store Er,label r in reference database Dref

  • 10:

    end for

  • 11:

    Initialize B // Set of detected bounding boxes

  • 12:

    for each patch embedding vjt do

  • 13:

        for each Er,label rDref do

  • 14:

                Compute similarity ← vjt·ErvjtEr

  • 15:

                if similarity > τdetection then

  • 16:

                  bbox Localize region (vjt)

  • 17:

                BBbbox

  • 18:

                  break

  • 19:

                end if

  • 20:

    end for

  • 21:

    end for

  • 22:

    // Step 2: Pixel-Accurate Mask Refiner

  • 23:

    Initialize M

  • 24:

    for each do

  • 25:

          maskRefineSegmentationIt,bbox

  • 26:

            refinedbboxBoundingBoxFromMaskmask

  • 27:

            MMrefinedbbox,mask

  • 28:

    end for

  • 29:

    // Step 3: Entropy reduction, verification, Semantic graph formation, and annotation

  • 30:

    Initialize GConstructSceneGraph(M) using Verified region labels

  • 31:

    for each component cG do

  • 32:

              candidateregionExtractRegionIt,c.bbox

  • 33:

              embeddingcandidateEmbedcandidateregion

  • 34:

              bestsimilarity

  • 35:

              bestlabelUNDEFINED

  • 36:

          for each Er,label rDref do

  • 37:

                        similaritySimilarityembeddingcandidate,Er

  • 38:

                    if similarity>bestsimilarity then

  • 39:

                          bestsimilarity similarity

  • 40:

                          bestlabellabel r

  • 41:

                    end if

  • 42:

          end for

  • 43:

          if bestsimilarity>θ then

  • 44:

                  c.labelbestlabel

  • 45:

          else

  • 46:

                   c.labelInitialLabelcandidateregion

  • 47:

          end if

  • 48:

    end for

  • 49:

    return G