18 Types of Regression in Machine Learning [Explained With Examples]
Updated on Feb 21, 2025 | 47 min read | 290.9k views
Share:
For working professionals
For fresh graduates
More
Updated on Feb 21, 2025 | 47 min read | 290.9k views
Share:
Regression in machine learning is a core technique used to model the relationship between a dependent variable (target) and one or more independent variables (features). Unlike classification (which predicts categories), regression deals with continuous numeric outcomes.
In simple terms, regression algorithms try to find a best-fit line or curve that can predict output values (Y) from input features (X). This makes regression analysis essential for data science tasks like forecasting and trend analysis.
There are many different types of regression models, each suited to specific kinds of problems and data. From straightforward Linear Regression to advanced techniques like Ridge/Lasso regularization and Decision Tree regression, knowing the distinctions is crucial. This guide will explore 18 different regression models and their real-world applications.
Stay ahead in data science, and artificial intelligence with our latest AI news covering real-time breakthroughs and innovations.
Below is a concise overview of the 18 types of regression in machine learning, each suited to different data characteristics and modeling goals. Use this table to quickly recall their primary applications or when you might consider each method.
Regression Type |
Primary Use |
1. Linear Regression | Baseline model for continuous outcomes under linear assumptions. |
2. Logistic Regression | Classification tasks (binary or multiclass) with interpretable log-odds. |
3. Polynomial Regression | Modeling curved relationships by adding polynomial terms. |
4. Ridge Regression | L2-penalized linear model to reduce variance and handle multicollinearity. |
5. Lasso Regression | L1-penalized linear model for feature selection and sparsity. |
6. Elastic Net Regression | Combination of L1 and L2 penalties balancing shrinkage and selection. |
7. Stepwise Regression | Iterative feature selection for simpler exploratory models. |
8. Decision Tree Regression | Rule-based splits handling non-linear effects with interpretability. |
9. Random Forest Regression | Ensemble of trees for better accuracy and reduced overfitting. |
10. Support Vector Regression (SVR) | Flexible function fitting with margin-based, kernel-driven approach. |
11. Principal Component Regression (PCR) | Dimensionality reduction first, then regression on principal components. |
12. Partial Least Squares (PLS) Regression | Supervised dimensionality reduction focusing on variance relevant to y. |
13. Bayesian Regression | Incorporates prior knowledge and provides uncertainty estimates. |
14. Quantile Regression | Predicting specific quantiles (median, tails) for robust analysis. |
15. Poisson Regression | Count data modeling under assumption that mean ≈ variance. |
16. Cox Regression | Time-to-event analysis handling censored data in survival settings. |
17. Time Series Regression | Forecasting with temporal structures and autocorrelation. |
18. Panel Data Regression | Modeling multiple entities across time, controlling for unobserved heterogeneity. |
Now that you have seen the types at a glance, let’s explore all the regression models in detail.
Please Note: All code snippets for regression types explained below are in Python with common libraries like scikit-learn. You can run them to see how each regression model works in practice.
Explore the ultimate comparison—uncover why Deepseek outperforms ChatGPT and Gemini today!
Linear Regression in machine learning is the most fundamental and widely used regression technique. It assumes a linear relationship between the independent variable(s) X and the dependent variable Y. The model tries to fit a straight line (in multi-dimensional space, a hyperplane) that best approximates all the data points.
The simplest form is Simple Linear Regression with one feature – find the formula below:
y = b0 + b1*x + e
In the equation above:
For multiple features, there’s Multiple Linear Regression – find the formula below:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn + e
In the equation above:
The end goal is to find the β values that minimize the error (often using Least Squares to minimize the sum of squared errors between predicted and actual y).
Key Characteristics of Linear Regression
Code Snippet
Below, you will fit a simple linear regression using scikit-learn’s LinearRegression class. This example assumes you have training data X_train (2D array of features) and y_train (target values). You can then predict on test data X_test.
from sklearn.linear_model import LinearRegression
# Sample training data
X_train = [[1.0], [2.0], [3.0], [4.0]] # e.g., feature = size of house (1000s of sq ft)
y_train = [150, 200, 250, 300] # e.g., target = price (in $1000s)
# Train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Coefficients and intercept
print("Slope (β1):", model.coef_[0])
print("Intercept (β0):", model.intercept_)
# Predict on new data
X_test = [[5.0]] # e.g., a 5 (i.e., 5000 sq ft) house
pred = model.predict(X_test)
print("Predicted price for 5000 sq ft house:", pred[0])
Output
This code outputs the learned slope and intercept for a simple linear regression, then predicts the price for a 5000 sq ft house:
Slope (β1): 50.0
Intercept (β0): 100.0
Predicted price for 5000 sq ft house: 350.0
Real-World Applications of Linear Regression
Example |
Use Case |
House Prices | Predicts price based on size, location, and features. |
Sales Forecasting | Estimates sales from ad spend, seasonality, etc. |
Student Scores | Models exam scores based on study hours. |
Salary Estimation | Predicts salary from years of experience. |
Also Read: Assumptions of Linear Regression
Logistic Regression in machine learning is a popular technique for classification problems (especially binary classification), but it is often taught alongside regression models because it uses a regression-like approach with a non-linear transformation.
Instead of fitting a straight line, logistic regression fits an S-shaped sigmoid curve – find the formula below:
sigmoid(z) = 1 / (1 + e^(-z))
In the equation above:
z = b0 + b1*x1 + b2*x2 + ... + bn*xn
The model is typically trained by maximizing the likelihood (or equivalently minimizing log-loss) rather than least squares. Logistic regression assumes the log-odds of the outcome is linear in X.
For a binary outcome, it outputs a probability, and you decide on a threshold (like 0.5) to classify it as 0 or 1.
Key Characteristics of Logistic Regression
Code Snippet
Below is how you might train a logistic regressor for a binary classification (e.g., predict if a student passed an exam (1) or not (0) based on hours studied).
# Sample training data
X_train = [[1], [2], [3], [4]] # hours studied
y_train = [0, 0, 1, 1] # 0 = failed, 1 = passed
# Train Logistic Regression model
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Predict probabilities for a new student who studied 2.5 hours
prob = clf.predict_proba([[2.5]])[0][1]
print("Probability of passing with 2.5 hours of study: %.3f" % prob)
Output
The predict_proba method gives the probability of each class. In this example, the model might output a probability (say ~0.3) for passing with 2.5 hours, which you could compare to a threshold (0.5 by default in predict) to classify as fail (0) or pass (1).
Probability of passing with 2.5 hours of study: 0.500
Real-World Applications of Logistic Regression
Example | Use Case |
Spam Detection | Classifies emails as spam or not spam. |
Medical Diagnosis | Predicts disease presence based on test results. |
Credit Default | Identifies risky borrowers. |
Marketing Response | Predicts if a customer will buy after seeing an ad. |
Logistic regression shines in its simplicity and interpretability (via odds ratios). It’s a great first choice for binary classification and one of the essential regression analysis types in machine learning (albeit for categorical outcomes).
Also Read: Difference Between Linear and Logistic Regression: A Comprehensive Guide for Beginners in 2025
Polynomial Regression extends linear regression by adding polynomial terms to the model. It is useful when the relationship between the independent and dependent variables is non-linear (curved) but can be approximated by a polynomial curve.
In essence, you create new features as powers of the original feature(s) and then perform linear regression on the expanded feature set.
For example, a quadratic regression on one feature x would use x2x^2x2 as an additional feature – find the formula below:
y = b0 + b1*x + b2*x^2 + e
In general, for a polynomial of degree d:
y = b0 + b1*x + b2*x^2 + ... + bd*x^d + e
This is still a linear model in terms of the coefficients (β’s), but the features are non-linear (powers of x). Polynomial regression can capture curvature by fitting a polynomial line instead of a straight line.
Note that polynomial regression can be done with multiple features, too (including interaction terms), though it quickly increases the number of terms.
Key Characteristics of Polynomial Regression
Code Snippet
Here’s an illustration of polynomial regression by fitting a quadratic curve. You’ll use PolynomialFeatures to generate polynomial features and then a linear regression on those:
If you run the code, you will see the coefficients and the prediction for a new input.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# Sample training data (X vs y with a non-linear relationship)
X_train = np.array([[1], [2], [3], [4], [5]]) # e.g., years of experience
y_train = np.array([2, 5, 10, 17, 26]) # e.g., performance metric that grows non-linearly
# Transform features to include polynomial terms up to degree 2 (quadratic)
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X_train) # adds X^2 as a feature
# Fit linear regression on the polynomial features
poly_model = LinearRegression().fit(X_poly, y_train)
print("Learned coefficients:", poly_model.coef_)
print("Learned intercept:", poly_model.intercept_)
# Predict on a new data point (e.g., 6 years of experience)
new_X = np.array([[6]])
new_X_poly = poly.transform(new_X)
pred_y = poly_model.predict(new_X_poly)
print("Predicted performance for 6 years:", pred_y[0])
Output
Depending on floating-point precision, the exact numbers may differ slightly, but they will generally show:
Learned coefficients: [0. 1.]
Learned intercept: 1.0
Predicted performance for 6 years: 37.0
Real-World Applications of Polynomial Regression
Example Scenario |
Description |
Economics – Diminishing Returns | Model diminishing returns (like ad spend vs. sales). |
Growth Curves | Approximate certain growth patterns with a polynomial. |
Physics – Trajectories | Predict projectile motion with a quadratic term. |
Trend Analysis | Fit non-linear trends in data with polynomial terms. |
Polynomial regression is basically performing a non-linear regression in machine learning while still using the efficient linear regression solvers. If the curve is too complex, other methods (like decision trees) might be more appropriate, but polynomial regression is a quick way to try to capture non-linearity.
Ridge Regression is a linear regression variant that addresses some limitations of ordinary least squares by adding a regularization term (penalty) to the loss function. It is also known as L2 regularization.
The ridge regression minimizes a modified cost function – the formula is given below:
Cost_ridge = Σ(from i=1 to N) [ (y_i - ŷ_i)^2 ] + λ * Σ(from j=1 to p) [ (β_j)^2 ]
In the equation above:
This penalty term shrinks the coefficients towards zero (but unlike Lasso, it never fully zeros them out). By adding this bias, ridge regression can reduce variance at the cost of a bit of bias, helping to prevent overfitting.
When to Use?
Ridge regression is also helpful when the number of features is large relative to the number of data points.
Range of λ:
Key Characteristics of Ridge Regression
Code Snippet
Using scikit-learn’s Ridge class, you can fit a ridge model. You’ll reuse the polynomial features example but apply ridge to it to see the effect of regularization.
This is similar to linear regression but with a penalty. If you compare ridge_model.coef_ to the earlier linear poly_model.coef_, you’d notice the ridge coefficients are smaller in magnitude (pulled closer to zero). By adjusting alpha, you can increase or decrease this effect. In practice, one would tune alpha to find a sweet spot between bias and variance.
from sklearn.linear_model import Ridge
# Using the polynomial features from earlier example (X_poly, y_train)
ridge_model = Ridge(alpha=1.0) # alpha is λ in sklearn (1.0 is a moderate penalty)
ridge_model.fit(X_poly, y_train)
print("Ridge coefficients:", ridge_model.coef_)
print("Ridge intercept:", ridge_model.intercept_)
Output
Because the polynomial relationship y=1+x2y = 1 + x^2y=1+x2 already fits the data perfectly, Ridge introduces only a small numeric difference from the exact solution of [0,1][0, 1][0,1] with intercept 111.
In other words, you get nearly the same fit as the plain polynomial regression but with tiny floating-point variations.
Ridge coefficients: [-3.60822483e-16 1.00000000e+00]
Ridge intercept: 1.0
Real-World Applications of Ridge Regression
Example Scenario |
Description |
Portfolio Risk Modeling | Predict returns from correlated indicators. |
Medical Data (Multi-omics) | Model disease progression from many correlated genomic features. |
Manufacturing | Predict quality from correlated process parameters. |
General Regularized Prediction | Good when p >> n to reduce overfitting while keeping all features. |
In summary, ridge regression introduces bias to achieve lower variance in predictions – a desirable trade-off in many practical machine learning regression problems.
Lasso Regression (Least Absolute Shrinkage and Selection Operator) is another regularized version of linear regression, but it uses an L1 penalty instead of L2.
Here’s the cost function for Lasso:
Cost_lasso = Σ(from i=1 to N) [ (y_i - ŷ_i)^2 ] + λ * Σ(from j=1 to p) [ |β_j| ]
This has a special property: it can drive some coefficients exactly to zero when λ is sufficiently large, effectively performing feature selection. Lasso regression thus not only helps with overfitting but can produce a more interpretable model by eliminating irrelevant features.
When to Use?
Key Characteristics of Lasso Regression
Code Snippet
Using scikit-learn’s Lasso class is straightforward. You’ll apply it to the same polynomial example for illustration.
After fitting, you may find that some coefficients are exactly 0. For example, if we had many polynomial terms, Lasso might zero out the higher-degree ones if they’re not contributing much.
In this small example with just 2 features [x and x^2], it might keep both, but with different values than ridge or OLS.
from sklearn.linear_model import Lasso
lasso_model = Lasso(alpha=0.5) # alpha is λ, here chosen moderately
lasso_model.fit(X_poly, y_train)
print("Lasso coefficients:", lasso_model.coef_)
print("Lasso intercept:", lasso_model.intercept_)
Output
Because the true function y=1+x2y = 1 + x^2y=1+x2 already fits the data perfectly, Lasso finds an intercept close to 1 and a quadratic coefficient close to 1, with no need for a linear term (coefficient ~ 0). Any small differences from exact values are just floating-point or solver tolerance effects.
Lasso coefficients: [0. 1.]
Lasso intercept: 1.0
Real-World Applications of Lasso Regression
Example Scenario |
Description |
Sparse Signal Recovery | Identify relevant signals/genes by zeroing out others. |
Finance – Key Indicators | Pick top indicators from hundreds for stock price modeling. |
Marketing – Feature Selection | Select main drivers of customer spend from many features. |
Environment Modeling | Identify key sensors for air quality from wide sensor data. |
Elastic Net Regression combines the penalties of ridge and lasso to get the benefits of both. Its penalty is a mix of L1 and L2 – the formula is listed below:
Cost_elastic_net = Σ (y_i - ŷ_i)^2 + λ [ α Σ|β_j| + (1 - α) Σ(β_j)^2 ]
In the equation above:
In practice, one chooses a fixed α between 0 and 1 (e.g., 0.5 for an even mix) and then tunes λ. Elastic Net thus simultaneously performs coefficient shrinkage and can zero out some coefficients.
When to Use?
Key Characteristics of Elastic Net Regression
Code Snippet
Scikit-learn’s ElasticNet allows setting both α (l1_ratio in sklearn) and λ (alpha in sklearn).
In this example, alpha=0.1 is a moderate regularization strength and l1_ratio=0.5 gives equal weight to L1 and L2 penalties. The resulting coefficients will be somewhere between ridge and lasso in effect.
Let’s demonstrate:
from sklearn.linear_model import ElasticNet
# ElasticNet with 50% L1, 50% L2 (l1_ratio=0.5)
en_model = ElasticNet(alpha=0.1, l1_ratio=0.5) # alpha is overall strength (λ), l1_ratio is mix
en_model.fit(X_poly, y_train)
print("Elastic Net coefficients:", en_model.coef_)
print("Elastic Net intercept:", en_model.intercept_)
Output
Because the true function is y=1+x2y = 1 + x^2y=1+x2, the model typically learns:
You might see tiny numerical deviations (e.g., 0.9999) due to floating-point precision and regularization.
Elastic Net coefficients: [0. 1.]
Elastic Net intercept: 1.0
Real-World Applications of Elastic Net
Example Scenario |
Description |
Genetics | Keep or drop correlated gene groups together. |
Economics | Group correlated indicators (e.g., inflation, interest). |
Retail | Retain or discard correlated store features. |
General high-dimensional data | Good compromise of shrinkage & selection when p >> n. |
Stepwise Regression is a variable selection method rather than a distinct regression model. It refers to an iterative procedure of adding or removing features from a regression model based on certain criteria (like p-values, AIC, BIC, or cross-validation performance). The goal is to arrive at a compact model with a subset of features that provides the best fit.
There are two main approaches:
A combination of both (adding and removing) is often called stepwise (or bidirectional) selection.
When to Use?
Key Characteristics of Stepwise Regression
Code Snippet
There isn’t a built-in scikit-learn function named “stepwise”, but one can implement forward or backward selection. Sklearn’s SequentialFeatureSelector can do this.
This will select 5 best features (you can adjust that or use cross-validation to decide when to stop).
from sklearn.feature_selection import SequentialFeatureSelector
# Assume X_train is a dataframe or array with many features
lr = LinearRegression()
sfs_forward = SequentialFeatureSelector(lr, n_features_to_select=5, direction='forward')
sfs_forward.fit(X_train, y_train)
selected_feats = sfs_forward.get_support(indices=True)
print("Selected feature indices:", selected_feats)
Output
A typical output — assuming X_train has multiple features — could look like this:
Selected feature indices: [0 2 4 7 9]
The exact indices depend on your dataset. The array shows which feature columns (by index) were chosen when selecting 5 features in forward selection mode.
Real-World Applications of Stepwise Regression
Example Scenario |
Description |
Medical Research (Predictors) | Narrow down from many health factors. |
Economic Modeling | Find a small subset of indicators for GDP. |
Academic Research | Identify top variables among many measured. |
Initial Feature Screening | Get a quick feature subset before advanced models. |
Remember that stepwise methods should be validated on a separate test set to ensure the selected features generalize. They provide one way to handle different types of regression analysis by focusing on the most impactful predictors.
Decision Tree Regression in machine learning is a non-parametric model that predicts a continuous value by learning decision rules from the data.
It builds a binary tree structure: at each node of the tree, the data is split based on a feature and a threshold value, such that the target values in each split are as homogeneous as possible.
This splitting continues recursively until a stopping criterion is met (e.g., minimum number of samples in a leaf or maximum tree depth). The leaf nodes of the tree contain a prediction value (often the mean of the target values that fall in that leaf).
In essence, a decision tree regression partitions the feature space into rectangular regions and fits a simple model (constant value) in each region. The result is a piecewise constant approximation to the target function.
Unlike linear models, decision trees can capture nonlinear interactions between features easily by their branching structure.
Key Characteristics of Decision Tree Regression
Decision tree regression will exactly fit any data if not constrained, so typically, one limits depth or requires a minimum number of samples per leaf to prevent too many splits.
Code Snippet
Here, you will limit max_depth to 3 to prevent an overly complex tree. The tree will find splits on the age feature to partition the income values. The prediction for age 18 would fall into one of the learned leaf intervals, and the average income for that interval would be output.
from sklearn.tree import DecisionTreeRegressor
# Sample data: predicting y from X (where relationship may be non-linear)
X_train = [[5], [10], [17], [20], [25]] # e.g., years of age
y_train = [100, 150, 170, 160, 180] # e.g., some income that rises then dips then rises
tree = DecisionTreeRegressor(max_depth=3, random_state=42)
tree.fit(X_train, y_train)
# Make a prediction
print("Predicted value for 18:", tree.predict([[18]])[0])
Output
Because 18 ends up in the same leaf as (20 → 160) and (25 → 180) or a similar grouping, the tree’s average value for that leaf is around 170. The exact partition can vary slightly depending on the data and settings, but you’ll likely see a result close to 170.
Predicted value for 18: 170.0
Real-World Applications of Decision Tree Regression
Example Scenario |
Description |
House Price Prediction (Rules) | Split houses by location, size, etc. for a final price leaf. |
Medicine – Dosage Effect | Split on dose and age for predicted response. |
Manufacturing Quality | Split on sensor readings for quality. |
Customer Value Prediction | Segments customers into leaves for value. |
Random Forest Regression is an ensemble learning method that builds on decision trees. The idea is to create a large number of decision trees (a forest) and aggregate their predictions (typically by averaging for regression).
Each individual tree is trained on a random subset of the data and/or features (hence "random"). Specifically, random forests use the following:
These two sources of randomness make the trees diverse. While any single tree might overfit, the average of many overfitting trees can significantly reduce variance. Random forests thus achieve better generalization than a single tree while maintaining the ability to handle non-linear relationships.
Key Characteristics of Random Forest Regression
Code Snippet
Here, you create a forest of 100 trees (common default). max_depth=3 to keep each tree small for interpretability (in practice, you might let them grow deeper or until leaf size is minimal).
The prediction for 18 will be an average of 100 different decision tree predictions, yielding a more stable result than an individual tree.
from sklearn.ensemble import RandomForestRegressor
# Continuing with the previous example data
rf = RandomForestRegressor(n_estimators=100, max_depth=3, random_state=42)
rf.fit(X_train, y_train)
print("Random Forest prediction for 18:", rf.predict([[18]])[0])
Output
A common prediction when running this random forest code is around:
Random Forest prediction for 18: 168.4
The exact number can vary slightly because of the following reasons:
Real-World Applications of Random Forest Regression
Example Scenario |
Description |
Stock Market Prediction | Average many decision trees for stable forecasts. |
Energy Load Forecasting | Capture complex weather-demand interactions. |
Predicting Equipment Failure Time | Average multiple trees for robust time-to-failure. |
General Tabular Data Regression | Often a top choice for structured data. |
Random forests are a go-to machine learning regression algorithm when you want good performance with minimal tuning. They handle a variety of data types and are resilient to outliers and scaling issues (no need for normalization typically). The main downsides are model size (hundreds of trees can be large) and interpretability.
Also Read: Random Forest Algorithm: When to Use & How to Use? [With Pros & Cons]
Support Vector Regression applies the principles of Support Vector Machines (SVM) to regression problems. The idea is to find a function (e.g., a hyperplane in feature space) that deviates from the actual targets by, at most, a certain epsilon (ε) for each training point and is as flat as possible.
In SVR, you specify an ε-insensitive zone: if a prediction is within ε of the true value, the model does not incur a loss for that point. Only points outside this margin (where the prediction error is larger than ε) will contribute to the loss – those are called support vectors.
Mathematically, for linear SVR, we solve an optimization problem:
Minimize: (1/2) * ||w||^2
Subject to: |y_i - (w · x_i + b)| ≤ ε for all i
Intuitively, it tries to fit a tube of width 2ε around the data. A wider tube (large ε) allows more error but fewer support vectors (so a simpler model), while a narrower tube forces precision (potentially more complex model).
Kernel tricks can be used to perform non-linear regression by mapping features to higher-dimensional spaces, similar to SVM classification.
Key Characteristics of SVR
Code Snippet
In this example, the RBF kernel SVR will fit a smooth curve through the points within the tolerance ε. The predicted value for 2.5 will lie on that learned curve.
from sklearn.svm import SVR
# Sample non-linear data (for demonstration)
X_train = [[0], [1], [2], [3], [4], [5]]
y_train = [0.5, 2.2, 3.9, 5.1, 4.9, 6.8] # somewhat nonlinear progression
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)
# Predict on a new point
print("SVR prediction for 2.5:", svr.predict([[2.5]])[0])
Output
A typical prediction when running this SVR code is around:
SVR prediction for 2.5: 4.3
The exact number can vary due to the RBF kernel’s smoothing and default hyperparameters, but you’ll generally see a value between 3.9 (X=2) and 5.1 (X=3).
Real-World Applications of SVR
Example Scenario |
Description |
Financial Time Series | Fit complex patterns with RBF kernel. |
Temperature Prediction | Model non-linear relationships with little data. |
Engineering – Smoothing | Tolerate small errors within ε for a smooth curve. |
Small Dataset Regression | Capture complexity when data is limited. |
SVR is one of the more advanced regression models in machine learning. If tuned well, it offers a good balance between bias and variance, particularly for smaller datasets with non-linear relationships.
You can also check out upGrad’s free tutorial, Support Vector Machine (SVM) for Anomaly Detection. Learn how it works and its step-by-step implementation.
Principal Component Regression is a technique that combines Principal Component Analysis (PCA) with linear regression. The main idea is to address multicollinearity and high dimensionality by first transforming the original features into a smaller set of principal components (which are uncorrelated) and then using those components as predictors in a regression model.
Steps in PCR:
When to Use?
Key Characteristics of PCR
Code Snippet
In this snippet, you will reduce 100 original features to 10 components and then fit a regression on those 10. One would choose 10 by checking how much variance those components explain or via CV.
The pca.components_ attribute can tell us which original features contribute to each component, but interpretation is not as straightforward as a normal linear model.
from sklearn.decomposition import PCA
# Suppose X_train has 100 features, and we suspect many are correlated/redundant
pca = PCA(n_components=10) # reduce to 10 components
X_train_reduced = pca.fit_transform(X_train)
# Now X_train_reduced has 10 features (principal components). Do linear regression on these.
lr = LinearRegression()
lr.fit(X_train_reduced, y_train)
# To predict on new data, remember to transform it through the same PCA:
X_test_reduced = pca.transform(X_test)
predictions = lr.predict(X_test_reduced)
Output
Below is a concise example of what your console might display:
X_train_reduced shape: (200, 10)
Coefficients: [ 0.12 -0.05 0.07 -0.01 0.03 0.02 0.09 -0.08 0.01 0.04]
Intercept: 3.2
Predictions on X_test:
[10.81 9.95 11.42 8.77 ... ]
Real-World Applications of Principal Component Regression
Example Scenario |
Description |
Chemometrics | Predict concentration from many correlated spectral features. |
Image Regression | Reduce dimensionality (eigenfaces) before regression. |
Economic Indices | Combine many correlated indicators into fewer components. |
Environmental Data | Compress redundant sensors, then predict outcome. |
PCR is valuable when you have more features than observations or highly correlated inputs. By focusing on the major components of variation in the data, it trades a bit of optimal predictive power for a simpler, more robust model.
Also Read: PCA in Machine Learning: Assumptions, Steps to Apply & Applications
Partial Least Squares Regression (PLS) is another technique for dealing with high-dimensional, collinear data. It is somewhat similar to PCR but with an important twist: PLS is a supervised method.
It finds new features (components) that are linear combinations of the original predictors while also taking into account the response variable y in determining those components. In other words, PLS tries to find directions in the feature space that have high covariance with the target. This often makes PLS more effective than PCR in predictive tasks because it doesn’t ignore y when reducing dimensions.
PLS produces a set of components (also called latent vectors) with two sets of weights: one for transforming X and one for y (for multivariate Y, though in simple regression Y is one-dimensional). You choose the number of components to keep similar to PCR.
When to Use?
Key Characteristics of PLS
Code Snippet
This code will compute 10 PLS components and use them to fit the regression. Under the hood, it’s finding weight vectors for X and y such that covariance is maximized. You can inspect pls.x_weights_ or pls.x_loadings_ to see how original features contribute to components.
from sklearn.cross_decomposition import PLSRegression
pls = PLSRegression(n_components=10)
pls.fit(X_train, y_train)
# After fitting, we can predict normally
y_pred = pls.predict(X_test)
Output
Below is a concise example of what you might see after running this code (exact numbers will vary depending on your data):
[10.24 9.56 11.43 ...]
Here, [10.24 9.56 11.43 ...] represents the predicted y values for the samples in your X_test. Since PLSRegression is trained with 10 components, it uses those components to produce these final predictions.
Real-World Applications of PLS Regression
Example Scenario |
Description |
Chemistry and Spectroscopy | Focus on variations relevant to property of interest. |
Genomics (QTL analysis) | Distill many genetic markers to latent factors correlated with phenotype. |
Manufacturing | Identify composite process factors that drive quality. |
Social Science | Combine correlated socioeconomic indicators that best predict an outcome. |
PLS regression is a powerful method when you're in the realm of “small n, large p” (few observations, many features) and want to reduce dimensionality in a way that’s oriented toward prediction. It fills an important niche between pure feature extraction (like PCA) and pure regression.
Bayesian Regression refers to a family of regression techniques that incorporate Bayesian principles into the modeling. In contrast to classical regression, which finds single best-fit parameters (point estimates), Bayesian regression treats the model parameters (coefficients) as random variables with prior distributions. It produces a posterior distribution for these parameters given the data.
This means instead of one set of coefficients, you get a distribution (mean and uncertainty) for each coefficient, and predictions are distributions as well (with credible intervals).
One common approach is Bayesian Linear Regression: assume a prior (often Gaussian) for the coefficients β and maybe for noise variance, then update this prior with the data (likelihood) to get a posterior.
The resulting posterior can often be derived in closed-form for linear regression with conjugate priors (Gaussian prior + Gaussian likelihood yields Gaussian posterior, which is the basis of Bayesian Ridge in scikit-learn). The prediction is typically the mean of the posterior predictive distribution, and you also get uncertainty (variance).
When to Use?
Key Characteristics of Bayesian Regression
Code Snippet
Scikit-learn offers BayesianRidge, which is a Bayesian version of linear regression with a Gaussian prior on coefficients. BayesianRidge also estimates the noise variance and includes automatic tuning of priors via evidence maximization.
The output shows the mean of the coefficient posterior, from which you can derive uncertainty (the coefficients' covariance matrix is sigma_).
from sklearn.linear_model import BayesianRidge
bayes_ridge = BayesianRidge()
bayes_ridge.fit(X_train, y_train)
print("Coefficients (mean of posterior):", bayes_ridge.coef_)
print("Coefficient uncertainties (std of posterior):", np.sqrt(bayes_ridge.sigma_))
Output
Below is an example of what you might see (the exact numbers depend on your data):
Coefficients (mean of posterior): [1.02]
Coefficient uncertainties (std of posterior): [0.15]
Real-World Applications of Bayesian Regression
Example Scenario |
Description |
Medical Prediction (with uncertainty) | Provide predictions and confidence intervals. |
Econometrics | Combine prior theory with data for parameter distributions. |
Engineering - Calibration | Incorporate prior knowledge for model parameters. |
Adaptive Modeling | Update posterior with new data for real-time personalization. |
Bayesian regression provides a probabilistic framework for regression, yielding richer information than classic point estimation. It ensures that you understand the uncertainty in predictions, which is crucial for many real-world applications where decisions depend on confidence in the results.
Also Read: Bayesian Linear Regression: What is, Function & Real Life Applications in 2024
Quantile Regression is a type of regression that estimates the conditional quantile (e.g., median or 90th percentile) of the response variable as a function of the predictors, instead of the mean.
Unlike ordinary least squares, which minimizes squared error (and thus focuses on the mean), quantile regression minimizes the sum of absolute errors weighted asymmetrically to target a specific quantile.
For example, Median Regression (0.5 quantile) minimizes absolute deviations (50% of points above, 50% below, akin to least absolute deviations). The 0.9 quantile regression would ensure ~90% of the residuals are negative and 10% positive (focusing on the upper end of distribution).
The quantile loss function for quantile q is: for residual r, loss = q*r if r >= 0, and = (q-1)*r if r < 0.
This creates a tilted absolute loss that penalizes over-predictions vs under-predictions differently to hit the desired quantile. Essentially, quantile regression gives a more complete view of the relationship between X and Y by modeling different points of the distribution of Y.
When to Use?
Key Characteristics of Quantile Regression
Code Snippet
Scikit-learn doesn’t have a direct quantile regression solver aside from using QuantileRegressor (introduced in v1.1) or using ensembles with quantile loss. For illustration, we’ve used QuantileRegressor for median.
This will fit a median regression (0.5 quantile).
# This snippet assumes sklearn >= 1.1 for QuantileRegressor
from sklearn.linear_model import QuantileRegressor
median_reg = QuantileRegressor(quantile=0.5, alpha=0) # alpha=0 for no regularization
median_reg.fit(X_train, y_train)
print("Coefficients for median regression:", median_reg.coef_)
Output
A typical console output (assuming a single feature) might be:
Coefficients for median regression: [1.]
If you had multiple features, you’d see something like [1.0 -0.2 0.5 ...]. Exact values depend on your dataset, but the array represents the slope estimates for the specified quantile (in this case, the median).
Real-World Applications of Quantile Regression:
Example Scenario |
Description |
Housing Market Analysis | Predict 10th/90th percentile house prices. |
Weather and Climate | Model extreme rainfall/temperature quantiles. |
Traffic and Travel Time | Estimate upper travel time bounds for planning. |
Finance – Value at Risk | Directly model high quantile losses for risk. |
Quantile regression adds another dimension to understanding predictive relationships by not restricting to the mean outcome. It is a powerful tool when distributions are skewed, have outliers, or when different quantiles exhibit different relationships with predictors.
Poisson Regression is a type of generalized linear model (GLM) used for modeling count data in situations where the response variable is a count (0, 1, 2, ...) that often follows a Poisson distribution.
It is appropriate when the counts are assumed to occur independently, and the mean of the distribution equals its variance (a property of Poisson, though this can be relaxed later).
Commonly, Poisson regression models the log of the expected count as a linear combination of features:
log(E[Y | X]) = b0 + b1*x1 + ... + bp*xp
Exponentiating both sides:
E[Y | X] = exp(b0 + b1*x1 + ... + bp*xp)
The model is typically fitted by maximum likelihood (equivalent to minimizing deviance for GLM). Poisson regression assumes the conditional distribution of Y given X is Poisson, which implies variance = mean for those counts.
Poisson might not fit well if the data show overdispersion (variance > mean). Then, variants like quasi-Poisson or Negative Binomial can be used.
When to Use?
Key Characteristics of Poisson Regression
Code Snippet
Python’s statsmodels can fit GLMs including Poisson.
In scikit-learn, you can also use PoissonRegressor for a more machine-learning API approach, which implements Poisson regression via gradient descent.
import statsmodels.api as sm
# Assume X_train is a 2D array of features, y_train are count outcomes
X_train_sm = sm.add_constant(X_train) # add intercept term
poisson_model = sm.GLM(y_train, X_train_sm, family=sm.families.Poisson())
poisson_results = poisson_model.fit()
print(poisson_results.summary())
Output
Below is an example of the Poisson regression summary you might see (details vary with your data):
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y_train No. Observations: 100
Model: GLM Df Residuals: 98
Model Family: Poisson Df Model: 1
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -220.3045
Date: Thu, 01 Jan 2025 Deviance: 45.6322
Time: 00:00:00 Pearson chi2: 44.581
No. Iterations: 4 Pseudo R-squ. (CS): 0.2183
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -2.1056 0.233 -9.042 0.000 -2.563 -1.648
x1 0.5076 0.093 5.464 0.000 0.325 0.690
==============================================================================
Please Note:
Real-World Applications of Poisson Regression:
Example Scenario |
Description |
Public Health – Disease Incidence | Model disease counts based on risk factors. |
Insurance – Claim Counts | Predict number of claims per policy. |
Call Center Volume | Forecast call counts per hour. |
Web Analytics | Count visits or clicks in time intervals. |
Also Read: Generalized Linear Models (GLM): Applications, Interpretation, and Challenges
Cox Regression, or Cox Proportional Hazards Model, is a regression technique used for survival analysis (time-to-event data). Unlike previous regressions, which predict a numeric value directly, Cox regression models the hazard function – essentially the instantaneous risk of the event occurring at time t, given that it hasn’t occurred before t, as a function of covariates.
It is a semi-parametric model: it doesn’t assume a particular baseline hazard function form, but it assumes the effect of covariates is multiplicative on the hazard and constant over time (hence “proportional hazards”).
Here's the formula:
h(t | X) = h0(t) * exp( b1*x1 + ... + bp*xp )
In the equation above:
Cox regression is often used to estimate these hazard ratios for factors while accounting for censoring (some subjects’ events not observed within study time).
When to Use?
Key Characteristics of Cox Regression
Code Snippet
This code illustrates how to model survival data with Cox Regression:
from lifelines import CoxPHFitter
import pandas as pd
# Sample dataset with survival times and covariates
data = pd.DataFrame({
'time': [5, 8, 12, 3, 15], # Survival time
'event': [1, 1, 0, 1, 0], # Event (1) or censored (0)
'age': [45, 50, 60, 35, 55],
'treatment': [1, 0, 1, 1, 0]
})
cph = CoxPHFitter()
cph.fit(data, duration_col='time', event_col='event')
cph.print_summary()
Output
When you run this code, you’ll see a table describing model coefficients, including hazard ratios:
coef exp(coef) se(coef) ...
age 0.0150 1.0151 0.0260
treatment -0.1100 0.8958 0.0300
...
Concordance = 0.80
Real-World Applications of Cox Regression
Example Scenario |
Description |
Clinical Trial Survival | Compare hazard rates of new drug vs. placebo. |
Customer Churn | Model time until churn with hazard ratios. |
Mechanical Failure | Assess how conditions affect failure time. |
Employee Turnover | Evaluate hazard of leaving given covariates. |
Cox regression is a powerful tool in regression analysis that focuses on time-to-event outcomes. It bridges statistics and practical decision-making, especially in life sciences and engineering. It provides insight into how factors impact the rate of an event occurrence over time.
Time Series Regression refers to regression methods specifically applied to time-indexed data where temporal order matters. In a narrow sense, it could mean using time as an explicit variable in a regression.
More broadly, it often involves using lagged values of the target or other time series as features to predict the target at future time steps. Many classical time series models (AR, ARMA, ARIMAX) can be viewed as regression models on past values.
Examples:
Time series regression often requires handling autocorrelation in residuals (which violates standard regression assumptions). Techniques like adding lag terms, using ARIMA errors, or generalized least squares are used.
When to Use?
Key Characteristics of Time Series Regression
Code Snippet
Use this approach when you want to see if there’s a trend over time in your numerical data.
from statsmodels.tsa.tsatools import add_trend
import pandas as pd
from statsmodels.api import OLS
# Sample dataset
data = pd.DataFrame({
'time': [1, 2, 3, 4, 5],
'sales': [200, 220, 240, 230, 260]
})
# Add a constant (intercept)
data = add_trend(data, trend='c')
model = OLS(data['sales'], data[['const', 'time']]).fit()
print(model.summary())
Output
The output is an OLS summary table with something like this:
OLS Regression Results
==============================================================================
Dep. Variable: sales R-squared: 0.76
Model: OLS Adj. R-squared: 0.68
...
coef std err t P>|t|
-------------------------------------------------------------------------------
const 190.0000 15.811 12.025 0.001
time 10.0000 4.000 2.500 0.071
...
The slope might be around 10, indicating an average increase of 10 sales units per time step.
Real-World Applications of Time Series Regression
Example Scenario |
Description |
Economic Forecasting | Use lagged GDP and indicators. |
Energy Load Forecasting | Predict next-day demand from weather and past usage. |
Website Traffic | Forecast daily visits with seasonal patterns. |
Stock Prices | Regress on past prices and macro data. |
Panel Data Regression is used when you have panel data (also called longitudinal data) – that is, multiple entities observed across time (or other contexts).
For example, test scores of multiple schools measured yearly or economic data of multiple countries over decades. Panel data regression models aim to account for both cross-sectional and time series variation, often focusing on controlling for unobserved heterogeneity across entities.
Two common approaches: Fixed Effects (FE) and Random Effects (RE) models
FE uses dummy variables for entities (or equivalently, de-mean data within each entity) to control for any constant omitted factors for that entity. It focuses on within-entity variation (how changes over time in X relate to changes in Y for the same entity)
The benefit is it can include time-invariant covariates (whereas FE cannot since dummies absorb those), and generally, RE is more efficient if its assumptions hold.
When to Use?
Key Characteristics of Panel Data Regression
Code Snippet
Use this approach for data that tracks multiple entities over time, allowing you to account for differences between entities.
import statsmodels.api as sm
from linearmodels.panel import PanelOLS
import pandas as pd
# Sample panel dataset
panel_data = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3],
'year': [2020, 2021, 2020, 2021, 2020, 2021],
'y': [3, 4, 2, 5, 1, 3],
'x': [10, 12, 8, 9, 7, 6]
}).set_index(['id', 'year'])
model = PanelOLS(panel_data['y'], sm.add_constant(panel_data['x']), entity_effects=True)
results = model.fit()
print(results.summary)
Output
When you run this, the summary helps you see how x relates to y once you control for entity-specific intercepts:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: y R-squared: 0.50
...
Coefficients Std. Err. T-stat P-value ...
x 0.5000 0.2500 2.0000 0.1300
...
Real-World Applications of Panel Data Regression
Example Scenario |
Description |
Economics – Policy Impact | Control for state-specific and time effects. |
Education | Within-student changes to test scores over time. |
Marketing – Panel Surveys | Account for consumer-specific baselines. |
Manufacturing | Different machines tracked over time, controlling for machine-specific traits. |
Regression analysis in machine learning offers a range of benefits, making it an indispensable tool in data-driven decision-making and predictive modeling.
Here’s how it adds value:
Benefit |
Description |
Quantifying Relationships | Measures how independent variables impact a dependent variable. |
Prediction and Forecasting | Enables accurate predictions for continuous outcomes. |
Identifying Significant Variables | Highlights the most influential predictors among multiple variables. |
Model Evaluation | Provides tools like R-squared and error metrics to evaluate model performance. |
Control and Optimization | Optimizes processes by understanding variable interactions. |
Risk Management | Assesses potential risks by analyzing variable relationships and their uncertainty. |
Decision Support | Guides strategic choices with data-backed insights for better resource allocation and planning. |
upGrad’s data science and machine learning courses equip you with the skills to master regression analysis through the following mediums:
Here are some of the best AI and ML courses you can try:
Ready to advance your career? Get personalized counseling from upGrad’s experts to help you choose the right program for your goals. You can also visit your nearest upGrad Career Center to kickstart your future!
Related Blogs:
Expand your expertise with the best resources available. Browse the programs below to find your ideal fit in Best Machine Learning and AI Courses Online.
Discover in-demand Machine Learning skills to expand your expertise. Explore the programs below to find the perfect fit for your goals.
Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Top Resources