Skip to content


The f-divergence is defined as1

\[ \operatorname{D}_f = \int f\left(\frac{p}{q}\right) q\mathrm d\mu, \]

where \(p\) and \(q\) are two densities and \(\mu\) is a reference distribution.

Requirements on the generating function

The generating function \(f\) is required to

  • be convex, and
  • \(f(1) =0\).

For \(f(x) = x \log x\) with \(x=p/q\), f-divergence is reduced to the KL divergence

\[ \begin{align} &\int f\left(\frac{p}{q}\right) q\mathrm d\mu \\ =& \int \frac{p}{q} \log \left( \frac{p}{q} \right) \mathrm d\mu \\ =& \int p \log \left( \frac{p}{q} \right) \mathrm d\mu. \end{align} \]

For more special cases of f-divergence, please refer to wikipedia1. Nowozin 2016 also provides a concise review of f-divergence2.

  1. Contributors to Wikimedia projects. F-divergence. In: Wikipedia [Internet]. 17 Jul 2021 [cited 4 Sep 2021]. Available: 

  2. Nowozin S, Cseke B, Tomioka R. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv [stat.ML]. 2016. Available: 

Contributors: LM