Combines multiple models on random subsets of data to to reduce variance. Bagging is a type of Ensemble Method.
Bagging, also known as Bootstrap Aggregating, combines multiple models trained on different Bootstrap samples.
These models can be trained in parallel since they are independent of one another.
Typically CART methods (Trees) tend to over fit, bagging is used to reduce variance.
Approach
- We generate bootsrapped samples
- Train the models which we denote as
- Aggregate predictions. In the case of regression, we use the average prediction. In classification we would use the majority vote of the trees.
We typically do not prune trees in Bagging. Therefore, each tree will have high variance and low bias.
We should use as a sufficient amount. However is the default parameter for some packages.
Assessment
We have that, on average, about of the observations from each sample will not appear. The observations do not appear are referred to as Out of Bag observations.
We use these Out of Bag observations as a Validation Set and we use our fitted model on these Out of Bag observations.