Regularization: Ridge and Lasso - Linear Regression

Today's Agenda

  1. What is Regularization?
  2. Quick recap of linear regression
  3. Ridge Regression Details
  4. Lasso Regression Details
  5. Implementation of Lasso and Ridge Regression
  6. When to Use Lasso and Ridge Regression?

What is Regularization?

Regularization is a method of reducing your changes of overfitting your model. Ridge regression and Lasso regressione are two potential ways you can take a regression model and eliminate overfitting. Regularization does this by adding additional constraints to the model.

Quick recap of linear regression

Ridge Regression Details

Remember, linear regression fits a line in the form: y=mx+b.
Ridge regression uses the same ordinary least squares method, but with an additional constraint. The additional constraint is that ridge regression chooses coefficients that fit the data well, but also chooses the smallest coefficients possible. In other words, the slope (or coefficient) of every independent variable should be as close to zero as possible. Ridge regression uses the alpha symbol to denote the regularization parameter. Alpha can be any positive integer. Ridge regression is also called L2 regularization. Increasing alpha pushes the coefficients closer to zero and an alpha of zero means the model is essentially just a regular linear regression. You should choose your alpha value by experimenting with different values of alpha and seeing which one gives you the highest r-squared or adjusted r-squared value.

Lasso Regression Details

An alternative to ridge regression is lasso regression. Lasso regression differs from ridge regression in that ridge regression pushes coefficients toward zero. Lasso regression pushes coefficients EXACTLY TO ZERO. When a coefficient is pushed to zero, that independent variable has not effect on the model. This can show you which variables should be in your model. Lasso also has a regularization parameter called alpha. The higher the alpha, the more variables are pushed to zero. An alpha of 0 is the same as not having regularization at all. You should choose your alpha value by experimenting with different values of alpha and seeing which one gives you the highest r-squared or adjusted r-squared value.

Implementation of Lasso and Ridge Regression in Python

In [62]:
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
In [64]:
data = pd.read_csv("MOCK_Income_Data.csv")
In [65]:
data.head()
Out[65]:
Experience Bachelors Masters PhD Age Income
0 0 0 0 0 21 21000
1 0 0 0 0 22 23450
2 1 0 0 0 24 84000
3 1 0 0 0 25 29000
4 1 0 0 0 26 35000
In [66]:
X = data[["Experience","Bachelors","Masters","PhD","Age"]]
Y = data[["Income"]]
In [67]:
x_train, x_test, y_train, y_test = train_test_split(X,Y)
In [68]:
x_train
Out[68]:
Experience Bachelors Masters PhD Age
11 21 1 1 1 40
15 21 1 0 0 45
19 39 0 0 0 57
3 1 0 0 0 25
13 19 1 1 0 41
... ... ... ... ... ...
16 25 1 0 0 52
12 18 1 1 0 40
8 12 0 0 0 32
17 26 0 0 0 54
0 0 0 0 0 21
In [69]:
ridge_model = Ridge(alpha=1).fit(x_train,y_train)
In [70]:
ridge_model.score(x_test,y_test)
Out[70]:
0.7099549559117013
In [71]:
lasso_model = Lasso(alpha=1).fit(x_train,y_train)
In [72]:
lasso_model.score(x_test,y_test)
Out[72]:
0.5359029485524133

When to Use Lasso and Ridge Regression?

When you have a lot of data points (or rows of data), regularization becomes less important because your model will have a lot of data to learn how to generalize well. However, if you have a lot of variables, then you may want to use lasso regularization to see which ones affect the dependent variable the most.

Lasso is easier to interpret than Ridge because it reduces the amount of variables. YOu can also use scikit-learn's ElasticNet class, which combines Lasso and Ridge, but then you have two sets of parameters to adjust.