Linear Regression for Data Science in 2026
Updated on December 10, 2025 7 minutes read
Linear regression is one of the simplest and most widely used models in statistics and machine learning. It describes how a continuous target variable changes as one or more input variables change.
Even in 2026, when deep learning and large language models are common, linear regression remains a core tool. It is fast, interpretable, and often the first model you try in real-world data science projects.
We will start from the mathematical definition of linear regression and then see how to solve it in three ways: closed form for one variable, closed form for many variables, and gradient descent.
What is Linear Regression?
Suppose you have a dataset.
where both the inputs and outputs are real-valued continuous variables. The goal of linear regression is to find a linear function that best predicts from .
In the most general case with features, the model is
where is the intercept and are the coefficients, also called weights. Our task is to estimate these parameters from data.
To measure how good a particular choice of parameters is, we use the least squares loss:
The optimal parameters are those that minimise this loss.
Simple Linear Regression (One Variable, Ordinary Least Squares)
In the simplest case, each input is just a single number. The model becomes
Here, and define a straight line. Linear regression in this setting means: find the line that best fits the data points in the least squares sense.
Formally, we want
Deriving the optimal parameters
Define the loss
To find its minimum, we set its partial derivatives to zero. This gives the normal equations.
Let the sample means be
Solving the system leads to a well-known closed-form solution for the slope.
and the intercept
So the best-fit line is simply.
Multiple Linear Regression (Many Variables, Ordinary Least Squares)
When each observation has multiple features, is no longer a single number but a vector of size :
The model becomes
For convenience, we often work in matrix form. We stack all targets into a vector and all features into a matrix :
- is an vector of targets.
- is an design matrix where each row is an observation and each column is a feature.
- is a parameter vector .
If we include the intercept as a column of ones in , we can write the predictions compactly as
Loss function in matrix form
The least squares loss becomes
Expanding this expression gives
We want to minimise with respect to . The term does not depend on , so its derivative is zero, and we can ignore it when taking the gradient.
Normal equation for multiple linear regression
Taking the gradient of with respect to and setting it to zero yields
Rearranging, we obtain the normal equation.n
If is invertible, the unique least squares solution is
In practice, for large problems or when is ill-conditioned, you may use numerical methods such as QR decomposition or singular value decomposition, or regularisation techniques like ridge regression, instead of computing directly.
Solving Linear Regression with Gradient Descent
The closed-form solutions above are elegant, but they can become expensive when and are very large. In modern data science workflows with 2026-scale datasets, we often use gradient descent instead.
Gradient descent is an iterative optimisation algorithm. Starting from an initial guess , we repeatedly update the parameters in the opposite direction of the gradient of the loss:
Where is the learning rate, a positive scalar that controls the step size.
Gradient descent for simple linear regression
For the one-variable model
The loss is
The partial derivatives are
Applying gradient descent, we update both parameters at each step as
We repeat these updates until the parameters change very little or the loss stops decreasing.
Pseudocode example
Here is a simple batch gradient descent loop for linear regression with one feature. The code uses vector operations for clarity.
a0, a1 = 0.0, 0.0 # initial parameters
lr = 0.001 # learning rate
for epoch in range(num_epochs):
y_hat = a0 + a1 * x # predictions
error = y - y_hat # residuals
grad_a0 = -2 * error.sum()
grad_a1 = -2 * (x * error).sum()
a0 = a0 - lr * grad_a0
a1 = a1 - lr * grad_a1
In real projects, you might use stochastic or mini-batch gradient descent, learning rate schedules, or optimisers like Adam, especially in larger machine learning pipelines.
Closed form vs gradient descent: when to use which?
Both approaches solve the same optimisation problem, but are useful in different situations.
Closed form (normal equation) is ideal when the number of features is relatively small, and you can safely compute or use an equivalent numerical solver.
Gradient descent scales better to very large datasets and feature spaces, and is easy to integrate into end-to-end machine learning pipelines.
Many modern libraries choose efficient numerical methods under the hood, so understanding both views helps you interpret and debug your models.
To practise these concepts in real projects, consider joining our live online Data Science and AI Bootcamp, where you will implement linear regression and many other models from scratch.
Quick quiz
Test your understanding with a short quiz. The correct options are marked in bold.
-
What is the formula of the optimal parameter vector in the multidimensional case?
- a).
- b).
- c).
Answer: (c)
-
Why do we set the derivative of the loss to zero in ordinary least squares?
- a). To find the extremum (minimum) of the loss function.
- b). To minimise the derivative itself.
- c). To keep only the real part of the derivative.
Answer: (a)
-
What is the main objective of linear regression?
- a).To find the line that passes exactly through all the points.
- b). To find the line or hyperplane that best describes the data in the least squares sense.
- c). To find the line that best separates the data into classes.
Answer: (b)
Next steps
If you understand this article, you already have a strong foundation for more advanced regression methods like regularised linear models, logistic regression, and Gaussian processes.
No degree? No problem. You can still become a Data Scientist with Code Labs Academy and build job-ready skills for the AI era.