Gradient Boosted Trees¶

Boosted trees is another ensemble method of trees. Similar to random forest, boosted trees makes prediction by combining the predictions from each tree. However, instead of performing average, boosted trees are additive models where the prediction \(f(\mathbf X)\) is the additions of each predictions¹,

\[ f(\mathbf X) = \sum_t^T f_t(\mathbf X), \]

where \(f_t(\mathbf X)\) is the prediction for tree \(i\) and \(T\) is the total number of trees. Given such a setup, the training becomes very different from random forests. As of 2023, there are two popular implementations of boosted trees, LightGBM and XGBoost. Training a boosted trees model finds a sequence of trees

\[ \{ f_1, f_2, \cdots, f_t, \cdots, f_T \}. \]

For a specified loss function \(\mathscr L(\mathbf y, \hat{\mathbf y})\), the sequence of trees helps reducing the loss step by step. At step \(i\), the loss is

\[ \mathscr L(y, f_1(\mathbf X) + f_2(\mathbf X) + \cdots + f_i(\mathbf X) ). \]

To optimize the model, we have to add a tree that reduces the loss the most and approximations are applied for numerical computations².

The XGBoost documentation and the original paper on XGBoost explains the idea nicely with examples.

Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. arXiv [cs.LG]. 2016. Available: http://arxiv.org/abs/1603.02754

There are more than one realization of gradient boosted trees³⁴.

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media, 2013. ↩
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. 2016.http://arxiv.org/abs/1603.02754. ↩
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al. LightGBM: A highly efficient gradient boosting decision tree. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S et al. (eds). Advances in neural information processing systems. Curran Associates, Inc., 2017https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf. ↩
Shi Y, Li J, Li Z. Gradient boosting with Piece-Wise linear regression trees. 2018.http://arxiv.org/abs/1802.05640. ↩

Contributors: LM