Boosting for Regression

Boosting will combine the outcome of many trees to tackle Bias.

In Boosting, trees are grown sequentially. The predictions of Trees are added together. In the case of boosting, smaller trees are preferred.

Method

In boosting, a forest of decision trees are made with the number of splits commonly being $1, 4, 8, 32$ . We refer to these trees as “weak learners”.

Each tree is fit to the residuals obtained from the previous tree. With each iteration is there is a learning parameter $λ$ which is multiplied by the predicted to slow learning.

Set $\hat{f} (x) = 0$ and $r_{i} = y_{i}$ for all $i$ in the set $(X, r)$
For $b = 1, ..., B$ do
1. For a tree $\hat{f}_{b}$ with $d$ splits and $d + 1$ terminal nodes to $(X, r)$
2. We add a shrunken version of the decision tree $\hat{f} (x) \leftarrow \hat{f} (x) + λ \hat{f}_{b} (x)$
3. Update the residuals $r_{i} \leftarrow r_{i} - λ \hat{f}_{b} (x)$
Output the boosted model $\hat{f} (x) = \sum_{b = 1}^{B} λ \hat{f}_{b} (x)$

Parameters

$B$ is the number of fits $λ$ is the learning rate (0.01, 0.001) $d$ is the interaction depth or complexity. For $d$ internal nodes there are $d + 1$ terminal nodes

Advantages

Improved Accuracy. The Boosting procedure slowly “corrects” what the previous models did not account for.

Reduced Overfitting. The Boosting algorithm starts with “weak learners” (stumps) which learn slowly to the data.

Robustness. Boosting is able to handle noisy data and adapts well to misclassified points.

Explorer

Boosting for Regression

Method

Advantages

Graph View

Table of Contents

Backlinks