Skip to main content
. 2025 Jul 24;27(8):784. doi: 10.3390/e27080784
Algorithm 2 Information-Theoretic Feature Extraction and Detection
Require: text, LM, EMB, N, τ, NGram, Classifier, k
Ensure: is_watermarked, confidence
  •   1:

    // Probability Curvature Features

  •   2:

    PoriglogP(text|LM)

  •   3:

    C[]      ▹ Curvature values

  •   4:

    for i=1 to N do

  •   5:

        textiSemanticPerturb(text)      ▹ Random synonym replacement, preserve structure

  •   6:

        PilogP(texti|LM)

  •   7:

        C.append(PorigPi)

  •   8:

    end for

  •   9:

    μC,σC,skewC,kurtCStatistics(C)

  •   10:

    // Information-Theoretic Features

  •   11:

    HavgAverageEntropy(text,LM)

  •   12:

    ImutualMutualInformation(textwords,textchars)

  •   13:

    DKLKLDivergence(Ptext,Preference)

  •   14:

    PPL2Havg      ▹ Perplexity from entropy

  •   15:

    // Watermark Detection Features

  •   16:

    Λ0      ▹ Log-likelihood ratio

  •   17:

    green_scores[]

  •   18:

    for each token ti in text do

  •   19:

        Nτ(ti1)SemanticNeighbors(ti1,τ)

  •   20:

        GreenListPartition(Nτ(ti1),h)[0]

  •   21:

        if tiGreenList then

  •   22:

            ΛΛ+log(|GreenList|/|Nτ|)

  •   23:

            green_scores.append(cos(eti,Mean(GreenList)))

  •   24:

        else

  •   25:

            ΛΛlog(1|GreenList|/|Nτ|)

  •   26:

        end if

  •   27:

    end for

  •   28:

    ρobservedlen(green_scores)/len(text)

  •   29:

    // Feature Aggregation

  •   30:

    fcurve[μC,σC,skewC,kurtC]

  •   31:

    finfo[Havg,Imutual,DKL,PPL]

  •   32:

    fwatermark[Λ,ρobserved,Mean(green_scores),Std(green_scores)]

  •   33:

    features[fcurve,finfo,fwatermark]

  •   34:

    // Classification with Confidence

  •   35:

    p(watermarked|features)Classifier(features)

  •   36:

    confidence2·|p(watermarked|features)0.5|

  •   37:

    is_watermarkedp(watermarked|features)>0.5

  •   38:

    return is_watermarked, confidence