Gradient Boosted TreesΒΆ
Boosted trees is another ensemble method of trees. Similar to random forest, boosted trees makes prediction by combining the predictions from each tree. However, instead of performing average, boosted trees are additive models where the prediction \(f(\mathbf X)\) is the additions of each predictions^{1},
where \(f_t(\mathbf X)\) is the prediction for tree \(i\) and \(T\) is the total number of trees. Given such a setup, the training becomes very different from random forests. As of 2023, there are two popular implementations of boosted trees, LightGBM and XGBoost. Training a boosted trees model finds a sequence of trees
For a specified loss function \(\mathscr L(\mathbf y, \hat{\mathbf y})\), the sequence of trees helps reducing the loss step by step. At step \(i\), the loss is
To optimize the model, we have to add a tree that reduces the loss the most and approximations are applied for numerical computations^{2}.
The XGBoost documentation and the original paper on XGBoost explains the idea nicely with examples.
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. arXiv [cs.LG]. 2016. Available: http://arxiv.org/abs/1603.02754
There are more than one realization of gradient boosted trees^{3}^{4}.

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media, 2013. ↩

Chen T, Guestrin C. XGBoost: A scalable tree boosting system. 2016.http://arxiv.org/abs/1603.02754. ↩

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al. LightGBM: A highly efficient gradient boosting decision tree. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S et al. (eds). Advances in neural information processing systems. Curran Associates, Inc., 2017https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76faPaper.pdf. ↩

Shi Y, Li J, Li Z. Gradient boosting with PieceWise linear regression trees. 2018.http://arxiv.org/abs/1802.05640. ↩