Overview

Discourages overly complex models and reduces variance. Regularization regularizes regression using the following cost function

We have that as a penalty term where is a regularization parameter.

Ridge Regression LASSO

Regression Method

If we set then we have OLS.

As we increase we would have larger and larger penalty values. Thus, there will be a tradeoff between the size of the coefficients the the model fit on the training data. This penalty term will reduce variance and introduce more bias.

Geometric Intuition (2D)

For Ridge and Lasso, there is a 2D geometric interpretation. For all pairs of coefficients and we can bound a Diamond or Circle as for possible point pairs.

The Rings are the values produce by the Loss Function for each of and

The centre of the Rings is where the loss function is minimized. With the penalty term in mind, we wish to find the pair with the shortest (straight line) distance to the centre. We can see in that this results in one of the terms, in this case being “snapped” to zero. In the case we can see this as the a combination of both coefficients.

ridge_lasso_2d.png

Note that, in general, as increases, the constraint region (circle, diamonds) will become smaller.

In addition, as increases, the coefficients will become smaller.