Linear regression is a supervised machine learning model. In reality, there are multiple different linear models for regression in machine learning, but in this video, when we say "linear regression", we will be referring to the Ordinary Least Squares (OLS) linear regression, which is the most common form of linear regression.
In middle school algebra, we learned that a line takes the form:
y = mx + b
Where m is the slope, x is the value on a cartesian plane, b is the intercept, and is the y value.
Linear regression essentially works by fitting a line of best fit through your data points that minimizes the sum of squared errors. The "error" is the difference between an actual and predicted value. OLS linear regression minimizes the total sum of squared errors. What this really means is that this algorithm finds the "m" and "b" that gives you the best possible predictions (by minimizing the sum of squared errors).
These are not ordered in any particular order.
OLS linear regressions are evaluated using the R-squared metric. This metric represents the amount of variance in the dependent variable that can be explained by your independent variable(s). This metric ranges from [0,1], with 0 meaning your independent variables explain 0% of the variance in your dependent variable and 1 meaning your independent variables explain 100% of the variance in your dependent variable. A higher R-squared value is better.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
data = pd.read_csv("MOCK_Income_Data.csv")
x_train, x_test, y_train, y_test = train_test_split(data["Age"],data["Income"])
x_train
np.array(x_train).reshape(-1,1)
reg_model = LinearRegression().fit(np.array(x_train).reshape(-1,1), y_train)
reg_model.score(np.array(x_test).reshape(-1,1), y_test)
Pros