Forecasting with Trees Using Darts¶
Darts provides wrappers for treebased models. In this section, we benchmark random forest and gradientboosting decison tree (GBDT) on the famous air passenger dataset. Through the benchmarks, we will see the key advantage and disadvantage of treebased models in forecasting.
Just Run It
The notebooks created to produce the results in this section can be found here for random forest and here for gbdt .
The Simple Random Forest¶
We will build different models to demonstrate the strength and weakness of random forest models. The focus will be insample and outofsample predictions. We know that trees are not quite good at extrapolating into realms where the outofsample distribution is different from the training data, due to the constant values assigned on each leaf. Time series forecasting in real world are often nonstationary and heteroscedastic, which implies that the distribution during test phase may be different from the distribution of the training data.
Data¶
We choose the famous air passenger data. The dataset shows the number of air passengers in each month.
"Simply Wrap the Model on the Data"¶
A naive idea is to simply wrap a treebased model on the data. Here is choose RandomForest from scikitlearn.
The predictions are quite off. However, if we look into the insample predictions, i.e., time range that the model has already seen during training, we would not have observed such bad predictions.
Detrend and Cheating¶
To confirm that this is due to the mismatch of the insample distribution and the outofsample distribution, we plot out the histograms of the training series and the test series.
This hints that we should at least detrend the data.
However, we will cheat a bit to detrend the whole series to get a grasp of the idea.
Distribution of Detrended Data
Without Information Leak¶
The above method leads to a great result, however, with information leakage during the detrending. Neverthless, this indicates that the performance of trees on outofsample predictions if we only predict on the cycle part of the series. In a realworld case, however, we have to predict the trend accurately for this to work. To better reconstruct the trend, there are also tricks like BoxCox transformations.
To stabilize the variance, we perform a BoxCox transformation.
With the transformed data, we build a simple linear trend using the training dataset and extrapolate the trend to the dates of the prediction.
Finally, we fit a random forest model on the detrended data, i.e., BoxCox transformed data  linear trend, then reconstruct the predictions, i.e., predictions + linear trend + Inverse BoxCox transformation. We observed a much better performance than the first RF we built.
Comparisons of the Three Random Forest Models¶
Observations by eyes showed that cheating leads to the best result, followed by a simple linear detrend model.
To formally benchmark the results, we computed several metrics.
Gradient Boosted Trees¶
Similar behavior is also observed for gradientboosted decision trees (GBDT).
Trees are Powerful¶
Up to this point, we may get the feeling that trees are not the best choices for forecasting. As a matter of fact, trees are widely used in many competitions and have achieved a lot in forecasting^{2}. Apart from being simple and robust, trees can also be made probabilistic. Trees are also attractive as our first model to try because they usually already work quite well out of the box^{1}.

"Out of the box" sounds like something easy to do. However, if one ever reads the list of parameters of LightGBM, the thought of "easy" will immediately diminish. ↩

Januschowski T, Wang Y, Torkkola K, Erkkilä T, Hasson H, Gasthaus J. Forecasting with trees. International journal of forecasting 2022; 38: 1473–1481. ↩