|
Algorithm 2 Stage II: Prior-Guided Affective Diffusion Optimization |
|
Input: Frozen Modules: JSCC Encoder/Decoder , Semantic Encoder ; |
| Trainable Modules: Structural Prior Network (including Duration Predictor), |
| Diffusion U-Net ; |
| Data: Speech dataset , Ground-truth Mel-spectrogram ;
|
| Hyperparameters: Loss weights , noise schedule .
|
|
Output: Optimized Generative Parameters .
|
-
1:
Initialize randomly.
-
2:
repeat
-
3:
for each batch do
-
4:
Digital Stream: Extract & Quantize semantic features:
-
5:
-
6:
Analog Stream: Transmit affective features via frozen JSCC:
-
7:
-
8:
-
9:
Predict unaligned distributions:
-
10:
Monotonic Alignment Search (MAS):
-
11:
-
12:
Get aligned semantic skeleton & durations:
-
13:
-
14:
Compute Structural Losses:
-
15:
-
16:
Sample time and noise
-
17:
Sample noisy state via OU forward process anchored at :
-
18:
-
19:
Predict noise residual with Affective FiLM conditioning:
-
20:
-
21:
Compute Diffusion Loss:
-
22:
-
23:
Aggregate Total Loss:
-
24:
-
25:
Update gradients:
-
26:
end for
-
27:
until convergence
-
28:
return
|