# Information GainΒΆ

Information gain is a frequently used metric in calculating the gain during a split in tree-based methods.

First of all, the entropy of a dataset is defined as

$S = - \sum_i p_i \log p_i - \sum_i (1-p_i)\log p_i,$

where $$p_i$$ is the probability of a class.

The information gain is the change of entropy.

To illustrate this idea, we use decision tree as an example. In a decision tree algorithm, we would split a node. Before splitting, we assign a label $$m$$ to the node, the entropy is

$S_m = - p_m \log p_m - (1-p_m)\log p_m.$

After the splitting, we have two groups that contributes to the entropy, group $$L$$ and group $$R$$ 1,

$S'_m = p_L (- p_m \log p_m - (1-p_m)\log p_m) + p_R (- p_m \log p_m - (1-p_m)\log p_m),$

where $$p_L$$ and $$p_R$$ are the probabilities of the two groups. Suppose we have 100 samples before splitting and 29 samples in the left group and 71 samples in the right group, we have $$p_L = 29/100$$ and $$p_R = 71/100$$.

The information gain is the difference between $$S_m$$ and $$S'_m$$,

$\mathrm{Gain} = S_m - S'_m.$

1. Shalev-Shwartz S, Ben-David S. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014 doi:10.1017/CBO9781107298019

Contributors: LM