Information GainΒΆ
Information gain is a frequently used metric in calculating the gain during a split in treebased methods.
First of all, the entropy of a dataset is defined as
where \(p_i\) is the probability of a class.
The information gain is the change of entropy.
To illustrate this idea, we use decision tree as an example. In a decision tree algorithm, we would split a node. Before splitting, we assign a label \(m\) to the node, the entropy is
After the splitting, we have two groups that contributes to the entropy, group \(L\) and group \(R\) ^{1},
where \(p_L\) and \(p_R\) are the probabilities of the two groups. Suppose we have 100 samples before splitting and 29 samples in the left group and 71 samples in the right group, we have \(p_L = 29/100\) and \(p_R = 71/100\).
The information gain is the difference between \(S_m\) and \(S'_m\),

ShalevShwartz S, BenDavid S. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014 doi:10.1017/CBO9781107298019. ↩