Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2019 Aug 9.

Published in final edited form as: Phys Rep. 2019 Mar 14;810:1–124. doi: 10.1016/j.physrep.2019.03.001

FIG. 69 — KL-divergences between the data distribution p_data and the model p_θ. Data is drawn from a bimodal Gaus-sian distribution with unit variances peaked at ±∆ with ∆ = 2.0 and the model p_θ(x) is a Gaussian with mean zero and same variance as p_θ(x). (Top) p_data and p_θ for ∆ = 2. (Bottom) D_KL(p_data||p_θ) (Data-Model) and D_KL(p_θ||p_data) (Model-Data) as a function of ∆. Notice that D_KL(p_data||p_θ) is insensitive to placing weight in the model distribution in regions where p_data ≈ 0 whereas D_KL(p_θ||p_data) punishes this harshly.