Shannon Entropy

Shannon entropy \(S\) is the expectation of information content \(I(X)=-\log \left(p\right)\)1,

\[\begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation}\]

Cross Entropy

Cross entropy is2

\[ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. \]

Cross entropy \(H(p, q)\) can also be decomposed,

\[ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), \]

where \(H(p)\) is the entropy of \(P\) and \(\operatorname{D}_{\mathrm{KL}}\) is the KL Divergence.

Cross entropy is widely used in classification problems, e.g., logistic regression.

