Skip to content

Forecasting with Trees Using Darts

Darts provides wrappers for tree-based models. In this section, we benchmark random forest and gradient-boosting decison tree (GBDT) on the famous air passenger dataset. Through the benchmarks, we will see the key advantage and disadvantage of tree-based models in forecasting.

Just Run It

The notebooks created to produce the results in this section can be found here for random forest and here for gbdt .

The Simple Random Forest

We will build different models to demonstrate the strength and weakness of random forest models. The focus will be in-sample and out-of-sample predictions. We know that trees are not quite good at extrapolating into realms where the out-of-sample distribution is different from the training data, due to the constant values assigned on each leaf. Time series forecasting in real world are often non-stationary and heteroscedastic, which implies that the distribution during test phase may be different from the distribution of the training data.


We choose the famous air passenger data. The dataset shows the number of air passengers in each month.

AP Data

"Simply Wrap the Model on the Data"

A naive idea is to simply wrap a tree-based model on the data. Here is choose RandomForest from scikit-learn.

Simple RF

The predictions are quite off. However, if we look into the in-sample predictions, i.e., time range that the model has already seen during training, we would not have observed such bad predictions.

In sample Simple RF

Detrend and Cheating

To confirm that this is due to the mismatch of the in-sample distribution and the out-of-sample distribution, we plot out the histograms of the training series and the test series.


This hints that we should at least detrend the data.


However, we will cheat a bit to detrend the whole series to get a grasp of the idea.

Detrended RF

Distribution of Detrended Data


Without Information Leak

The above method leads to a great result, however, with information leakage during the detrending. Neverthless, this indicates that the performance of trees on out-of-sample predictions if we only predict on the cycle part of the series. In a real-world case, however, we have to predict the trend accurately for this to work. To better reconstruct the trend, there are also tricks like Box-Cox transformations.

To stabilize the variance, we perform a Box-Cox transformation.

Box Cox

With the transformed data, we build a simple linear trend using the training dataset and extrapolate the trend to the dates of the prediction.

Linear Trend

Finally, we fit a random forest model on the detrended data, i.e., Box-Cox transformed data - linear trend, then reconstruct the predictions, i.e., predictions + linear trend + Inverse Box-Cox transformation. We observed a much better performance than the first RF we built.

box cox + linear trend

Comparisons of the Three Random Forest Models

Observations by eyes showed that cheating leads to the best result, followed by a simple linear detrend model.


To formally benchmark the results, we computed several metrics.

Metric comparison

Gradient Boosted Trees

Similar behavior is also observed for gradient-boosted decision trees (GBDT).


Metric comparison

Trees are Powerful

Up to this point, we may get the feeling that trees are not the best choices for forecasting. As a matter of fact, trees are widely used in many competitions and have achieved a lot in forecasting2. Apart from being simple and robust, trees can also be made probabilistic. Trees are also attractive as our first model to try because they usually already work quite well out of the box1.

  1. "Out of the box" sounds like something easy to do. However, if one ever reads the list of parameters of LightGBM, the thought of "easy" will immediately diminish. 

  2. Januschowski T, Wang Y, Torkkola K, Erkkilä T, Hasson H, Gasthaus J. Forecasting with trees. International journal of forecasting 2022; 38: 1473–1481. 

Contributors: LM