- Blog Categories
- Software Development
- Data Science
- AI/ML
- Marketing
- General
- MBA
- Management
- Legal
- Software Development Projects and Ideas
- 12 Computer Science Project Ideas
- 28 Beginner Software Projects
- Top 10 Engineering Project Ideas
- Top 10 Easy Final Year Projects
- Top 10 Mini Projects for Engineers
- 25 Best Django Project Ideas
- Top 20 MERN Stack Project Ideas
- Top 12 Real Time Projects
- Top 6 Major CSE Projects
- 12 Robotics Projects for All Levels
- Java Programming Concepts
- Abstract Class in Java and Methods
- Constructor Overloading in Java
- StringBuffer vs StringBuilder
- Java Identifiers: Syntax & Examples
- Types of Variables in Java Explained
- Composition in Java: Examples
- Append in Java: Implementation
- Loose Coupling vs Tight Coupling
- Integrity Constraints in DBMS
- Different Types of Operators Explained
- Career and Interview Preparation in IT
- Top 14 IT Courses for Jobs
- Top 20 Highest Paying Languages
- 23 Top CS Interview Q&A
- Best IT Jobs without Coding
- Software Engineer Salary in India
- 44 Agile Methodology Interview Q&A
- 10 Software Engineering Challenges
- Top 15 Tech's Daily Life Impact
- 10 Best Backends for React
- Cloud Computing Reference Models
- Web Development and Security
- Find Installed NPM Version
- Install Specific NPM Package Version
- Make API Calls in Angular
- Install Bootstrap in Angular
- Use Axios in React: Guide
- StrictMode in React: Usage
- 75 Cyber Security Research Topics
- Top 7 Languages for Ethical Hacking
- Top 20 Docker Commands
- Advantages of OOP
- Data Science Projects and Applications
- 42 Python Project Ideas for Beginners
- 13 Data Science Project Ideas
- 13 Data Structure Project Ideas
- 12 Real-World Python Applications
- Python Banking Project
- Data Science Course Eligibility
- Association Rule Mining Overview
- Cluster Analysis in Data Mining
- Classification in Data Mining
- KDD Process in Data Mining
- Data Structures and Algorithms
- Binary Tree Types Explained
- Binary Search Algorithm
- Sorting in Data Structure
- Binary Tree in Data Structure
- Binary Tree vs Binary Search Tree
- Recursion in Data Structure
- Data Structure Search Methods: Explained
- Binary Tree Interview Q&A
- Linear vs Binary Search
- Priority Queue Overview
- Python Programming and Tools
- Top 30 Python Pattern Programs
- List vs Tuple
- Python Free Online Course
- Method Overriding in Python
- Top 21 Python Developer Skills
- Reverse a Number in Python
- Switch Case Functions in Python
- Info Retrieval System Overview
- Reverse a Number in Python
- Real-World Python Applications
- Data Science Careers and Comparisons
- Data Analyst Salary in India
- Data Scientist Salary in India
- Free Excel Certification Course
- Actuary Salary in India
- Data Analyst Interview Guide
- Pandas Interview Guide
- Tableau Filters Explained
- Data Mining Techniques Overview
- Data Analytics Lifecycle Phases
- Data Science Vs Analytics Comparison
- Artificial Intelligence and Machine Learning Projects
- Exciting IoT Project Ideas
- 16 Exciting AI Project Ideas
- 45+ Interesting ML Project Ideas
- Exciting Deep Learning Projects
- 12 Intriguing Linear Regression Projects
- 13 Neural Network Projects
- 5 Exciting Image Processing Projects
- Top 8 Thrilling AWS Projects
- 12 Engaging AI Projects in Python
- NLP Projects for Beginners
- Concepts and Algorithms in AIML
- Basic CNN Architecture Explained
- 6 Types of Regression Models
- Data Preprocessing Steps
- Bagging vs Boosting in ML
- Multinomial Naive Bayes Overview
- Bayesian Network Example
- Bayes Theorem Guide
- Top 10 Dimensionality Reduction Techniques
- Neural Network Step-by-Step Guide
- Technical Guides and Comparisons
- Make a Chatbot in Python
- Compute Square Roots in Python
- Permutation vs Combination
- Image Segmentation Techniques
- Generative AI vs Traditional AI
- AI vs Human Intelligence
- Random Forest vs Decision Tree
- Neural Network Overview
- Perceptron Learning Algorithm
- Selection Sort Algorithm
- Career and Practical Applications in AIML
- AI Salary in India Overview
- Biological Neural Network Basics
- Top 10 AI Challenges
- Production System in AI
- Top 8 Raspberry Pi Alternatives
- Top 8 Open Source Projects
- 14 Raspberry Pi Project Ideas
- 15 MATLAB Project Ideas
- Top 10 Python NLP Libraries
- Naive Bayes Explained
- Digital Marketing Projects and Strategies
- 10 Best Digital Marketing Projects
- 17 Fun Social Media Projects
- Top 6 SEO Project Ideas
- Digital Marketing Case Studies
- Coca-Cola Marketing Strategy
- Nestle Marketing Strategy Analysis
- Zomato Marketing Strategy
- Monetize Instagram Guide
- Become a Successful Instagram Influencer
- 8 Best Lead Generation Techniques
- Digital Marketing Careers and Salaries
- Digital Marketing Salary in India
- Top 10 Highest Paying Marketing Jobs
- Highest Paying Digital Marketing Jobs
- SEO Salary in India
- Content Writer Salary Guide
- Digital Marketing Executive Roles
- Career in Digital Marketing Guide
- Future of Digital Marketing
- MBA in Digital Marketing Overview
- Digital Marketing Techniques and Channels
- 9 Types of Digital Marketing Channels
- Top 10 Benefits of Marketing Branding
- 100 Best YouTube Channel Ideas
- YouTube Earnings in India
- 7 Reasons to Study Digital Marketing
- Top 10 Digital Marketing Objectives
- 10 Best Digital Marketing Blogs
- Top 5 Industries Using Digital Marketing
- Growth of Digital Marketing in India
- Top Career Options in Marketing
- Interview Preparation and Skills
- 73 Google Analytics Interview Q&A
- 56 Social Media Marketing Q&A
- 78 Google AdWords Interview Q&A
- Top 133 SEO Interview Q&A
- 27+ Digital Marketing Q&A
- Digital Marketing Free Course
- Top 9 Skills for PPC Analysts
- Movies with Successful Social Media Campaigns
- Marketing Communication Steps
- Top 10 Reasons to Be an Affiliate Marketer
- Career Options and Paths
- Top 25 Highest Paying Jobs India
- Top 25 Highest Paying Jobs World
- Top 10 Highest Paid Commerce Job
- Career Options After 12th Arts
- Top 7 Commerce Courses Without Maths
- Top 7 Career Options After PCB
- Best Career Options for Commerce
- Career Options After 12th CS
- Top 10 Career Options After 10th
- 8 Best Career Options After BA
- Projects and Academic Pursuits
- 17 Exciting Final Year Projects
- Top 12 Commerce Project Topics
- Top 13 BCA Project Ideas
- Career Options After 12th Science
- Top 15 CS Jobs in India
- 12 Best Career Options After M.Com
- 9 Best Career Options After B.Sc
- 7 Best Career Options After BCA
- 22 Best Career Options After MCA
- 16 Top Career Options After CE
- Courses and Certifications
- 10 Best Job-Oriented Courses
- Best Online Computer Courses
- Top 15 Trending Online Courses
- Top 19 High Salary Certificate Courses
- 21 Best Programming Courses for Jobs
- What is SGPA? Convert to CGPA
- GPA to Percentage Calculator
- Highest Salary Engineering Stream
- 15 Top Career Options After Engineering
- 6 Top Career Options After BBA
- Job Market and Interview Preparation
- Why Should You Be Hired: 5 Answers
- Top 10 Future Career Options
- Top 15 Highest Paid IT Jobs India
- 5 Common Guesstimate Interview Q&A
- Average CEO Salary: Top Paid CEOs
- Career Options in Political Science
- Top 15 Highest Paying Non-IT Jobs
- Cover Letter Examples for Jobs
- Top 5 Highest Paying Freelance Jobs
- Top 10 Highest Paying Companies India
- Career Options and Paths After MBA
- 20 Best Careers After B.Com
- Career Options After MBA Marketing
- Top 14 Careers After MBA In HR
- Top 10 Highest Paying HR Jobs India
- How to Become an Investment Banker
- Career Options After MBA - High Paying
- Scope of MBA in Operations Management
- Best MBA for Working Professionals India
- MBA After BA - Is It Right For You?
- Best Online MBA Courses India
- MBA Project Ideas and Topics
- 11 Exciting MBA HR Project Ideas
- Top 15 MBA Project Ideas
- 18 Exciting MBA Marketing Projects
- MBA Project Ideas: Consumer Behavior
- What is Brand Management?
- What is Holistic Marketing?
- What is Green Marketing?
- Intro to Organizational Behavior Model
- Tech Skills Every MBA Should Learn
- Most Demanding Short Term Courses MBA
- MBA Salary, Resume, and Skills
- MBA Salary in India
- HR Salary in India
- Investment Banker Salary India
- MBA Resume Samples
- Sample SOP for MBA
- Sample SOP for Internship
- 7 Ways MBA Helps Your Career
- Must-have Skills in Sales Career
- 8 Skills MBA Helps You Improve
- Top 20+ SAP FICO Interview Q&A
- MBA Specializations and Comparative Guides
- Why MBA After B.Tech? 5 Reasons
- How to Answer 'Why MBA After Engineering?'
- Why MBA in Finance
- MBA After BSc: 10 Reasons
- Which MBA Specialization to choose?
- Top 10 MBA Specializations
- MBA vs Masters: Which to Choose?
- Benefits of MBA After CA
- 5 Steps to Management Consultant
- 37 Must-Read HR Interview Q&A
- Fundamentals and Theories of Management
- What is Management? Objectives & Functions
- Nature and Scope of Management
- Decision Making in Management
- Management Process: Definition & Functions
- Importance of Management
- What are Motivation Theories?
- Tools of Financial Statement Analysis
- Negotiation Skills: Definition & Benefits
- Career Development in HRM
- Top 20 Must-Have HRM Policies
- Project and Supply Chain Management
- Top 20 Project Management Case Studies
- 10 Innovative Supply Chain Projects
- Latest Management Project Topics
- 10 Project Management Project Ideas
- 6 Types of Supply Chain Models
- Top 10 Advantages of SCM
- Top 10 Supply Chain Books
- What is Project Description?
- Top 10 Project Management Companies
- Best Project Management Courses Online
- Salaries and Career Paths in Management
- Project Manager Salary in India
- Average Product Manager Salary India
- Supply Chain Management Salary India
- Salary After BBA in India
- PGDM Salary in India
- Top 7 Career Options in Management
- CSPO Certification Cost
- Why Choose Product Management?
- Product Management in Pharma
- Product Design in Operations Management
- Industry-Specific Management and Case Studies
- Amazon Business Case Study
- Service Delivery Manager Job
- Product Management Examples
- Product Management in Automobiles
- Product Management in Banking
- Sample SOP for Business Management
- Video Game Design Components
- Top 5 Business Courses India
- Free Management Online Course
- SCM Interview Q&A
- Fundamentals and Types of Law
- Acceptance in Contract Law
- Offer in Contract Law
- 9 Types of Evidence
- Types of Law in India
- Introduction to Contract Law
- Negotiable Instrument Act
- Corporate Tax Basics
- Intellectual Property Law
- Workmen Compensation Explained
- Lawyer vs Advocate Difference
- Law Education and Courses
- LLM Subjects & Syllabus
- Corporate Law Subjects
- LLM Course Duration
- Top 10 Online LLM Courses
- Online LLM Degree
- Step-by-Step Guide to Studying Law
- Top 5 Law Books to Read
- Why Legal Studies?
- Pursuing a Career in Law
- How to Become Lawyer in India
- Career Options and Salaries in Law
- Career Options in Law India
- Corporate Lawyer Salary India
- How To Become a Corporate Lawyer
- Career in Law: Starting, Salary
- Career Opportunities: Corporate Law
- Business Lawyer: Role & Salary Info
- Average Lawyer Salary India
- Top Career Options for Lawyers
- Types of Lawyers in India
- Steps to Become SC Lawyer in India
- Tutorials
- Software Tutorials
- C Tutorials
- Recursion in C: Fibonacci Series
- Checking String Palindromes in C
- Prime Number Program in C
- Implementing Square Root in C
- Matrix Multiplication in C
- Understanding Double Data Type
- Factorial of a Number in C
- Structure of a C Program
- Building a Calculator Program in C
- Compiling C Programs on Linux
- Java Tutorials
- Handling String Input in Java
- Determining Even and Odd Numbers
- Prime Number Checker
- Sorting a String
- User-Defined Exceptions
- Understanding the Thread Life Cycle
- Swapping Two Numbers
- Using Final Classes
- Area of a Triangle
- Skills
- Explore Skills
- Management Skills
- Software Engineering
- JavaScript
- Data Structure
- React.js
- Core Java
- Node.js
- Blockchain
- SQL
- Full stack development
- Devops
- NFT
- BigData
- Cyber Security
- Cloud Computing
- Database Design with MySQL
- Cryptocurrency
- Python
- Digital Marketings
- Advertising
- Influencer Marketing
- Performance Marketing
- Search Engine Marketing
- Email Marketing
- Content Marketing
- Social Media Marketing
- Display Advertising
- Marketing Analytics
- Web Analytics
- Affiliate Marketing
- MBA
- MBA in Finance
- MBA in HR
- MBA in Marketing
- MBA in Business Analytics
- MBA in Operations Management
- MBA in International Business
- MBA in Information Technology
- MBA in Healthcare Management
- MBA In General Management
- MBA in Agriculture
- MBA in Supply Chain Management
- MBA in Entrepreneurship
- MBA in Project Management
- Management Program
- Consumer Behaviour
- Supply Chain Management
- Financial Analytics
- Introduction to Fintech
- Introduction to HR Analytics
- Fundamentals of Communication
- Art of Effective Communication
- Introduction to Research Methodology
- Mastering Sales Technique
- Business Communication
- Fundamentals of Journalism
- Economics Masterclass
- Free Courses
- Home
- Blog
- Artificial Intelligence
- 18 Types of Regression in Machine Learning [Explained With Examples]
18 Types of Regression in Machine Learning [Explained With Examples]
Updated on Feb 18, 2025 | 47 min read
Share:
Regression in machine learning is a core technique used to model the relationship between a dependent variable (target) and one or more independent variables (features). Unlike classification (which predicts categories), regression deals with continuous numeric outcomes.
In simple terms, regression algorithms try to find a best-fit line or curve that can predict output values (Y) from input features (X). This makes regression analysis essential for data science tasks like forecasting and trend analysis.
There are many different types of regression models, each suited to specific kinds of problems and data. From straightforward Linear Regression to advanced techniques like Ridge/Lasso regularization and Decision Tree regression, knowing the distinctions is crucial. This guide will explore 18 different regression models and their real-world applications.
18 Types of Regression in Machine Learning in a Glance
Below is a concise overview of the 18 types of regression in machine learning, each suited to different data characteristics and modeling goals. Use this table to quickly recall their primary applications or when you might consider each method.
Regression Type |
Primary Use |
1. Linear Regression | Baseline model for continuous outcomes under linear assumptions. |
2. Logistic Regression | Classification tasks (binary or multiclass) with interpretable log-odds. |
3. Polynomial Regression | Modeling curved relationships by adding polynomial terms. |
4. Ridge Regression | L2-penalized linear model to reduce variance and handle multicollinearity. |
5. Lasso Regression | L1-penalized linear model for feature selection and sparsity. |
6. Elastic Net Regression | Combination of L1 and L2 penalties balancing shrinkage and selection. |
7. Stepwise Regression | Iterative feature selection for simpler exploratory models. |
8. Decision Tree Regression | Rule-based splits handling non-linear effects with interpretability. |
9. Random Forest Regression | Ensemble of trees for better accuracy and reduced overfitting. |
10. Support Vector Regression (SVR) | Flexible function fitting with margin-based, kernel-driven approach. |
11. Principal Component Regression (PCR) | Dimensionality reduction first, then regression on principal components. |
12. Partial Least Squares (PLS) Regression | Supervised dimensionality reduction focusing on variance relevant to y. |
13. Bayesian Regression | Incorporates prior knowledge and provides uncertainty estimates. |
14. Quantile Regression | Predicting specific quantiles (median, tails) for robust analysis. |
15. Poisson Regression | Count data modeling under assumption that mean ≈ variance. |
16. Cox Regression | Time-to-event analysis handling censored data in survival settings. |
17. Time Series Regression | Forecasting with temporal structures and autocorrelation. |
18. Panel Data Regression | Modeling multiple entities across time, controlling for unobserved heterogeneity. |
Now that you have seen the types at a glance, let’s explore all the regression models in detail.
Please Note: All code snippets for regression types explained below are in Python with common libraries like scikit-learn. You can run them to see how each regression model works in practice.
1. Linear Regression
Linear Regression in machine learning is the most fundamental and widely used regression technique. It assumes a linear relationship between the independent variable(s) X and the dependent variable Y. The model tries to fit a straight line (in multi-dimensional space, a hyperplane) that best approximates all the data points.
The simplest form is Simple Linear Regression with one feature – find the formula below:
y = b0 + b1*x + e
In the equation above:
- b0 (intercept) is how much y is offset when x = 0
- b1 (coefficient) is how much y changes with a one-unit increase in x
- e is the error term
For multiple features, there’s Multiple Linear Regression – find the formula below:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn + e
In the equation above:
- b0 is again the intercept
- b1 ... bn are the coefficients for each feature x1 ... xn
- e is the error term
The end goal is to find the β values that minimize the error (often using Least Squares to minimize the sum of squared errors between predicted and actual y).
Key Characteristics of Linear Regression
- Fast and Easy to Interpret: Each coefficient clearly shows how a feature affects the target.
- Assumes a Linear Relationship: Performance suffers if the true relationship is non-linear.
- Sensitive to Outliers: Extreme data points can skew results significantly.
- Requires Assumption Checks: Works best when data doesn’t violate assumptions like homoscedasticity (constant error variance).
- Common Starting Point: Often, it is the first model tried in regression tasks for its simplicity and transparency.
Code Snippet
Below, you will fit a simple linear regression using scikit-learn’s LinearRegression class. This example assumes you have training data X_train (2D array of features) and y_train (target values). You can then predict on test data X_test.
from sklearn.linear_model import LinearRegression
# Sample training data
X_train = [[1.0], [2.0], [3.0], [4.0]] # e.g., feature = size of house (1000s of sq ft)
y_train = [150, 200, 250, 300] # e.g., target = price (in $1000s)
# Train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Coefficients and intercept
print("Slope (β1):", model.coef_[0])
print("Intercept (β0):", model.intercept_)
# Predict on new data
X_test = [[5.0]] # e.g., a 5 (i.e., 5000 sq ft) house
pred = model.predict(X_test)
print("Predicted price for 5000 sq ft house:", pred[0])
Output
This code outputs the learned slope and intercept for a simple linear regression, then predicts the price for a 5000 sq ft house:
- Slope (β1): ~50 → Each extra 1000 sq ft adds $50,000 to the price.
- Intercept (β0): ~100 → A 0 sq ft house has a baseline price of $100,000.
- Prediction: ~350 → The model predicts a $350,000 price for a 5000 sq ft house.
Slope (β1): 50.0
Intercept (β0): 100.0
Predicted price for 5000 sq ft house: 350.0
Real-World Applications of Linear Regression
Example |
Use Case |
House Prices | Predicts price based on size, location, and features. |
Sales Forecasting | Estimates sales from ad spend, seasonality, etc. |
Student Scores | Models exam scores based on study hours. |
Salary Estimation | Predicts salary from years of experience. |
Also Read: Assumptions of Linear Regression
2. Logistic Regression
Logistic Regression in machine learning is a popular technique for classification problems (especially binary classification), but it is often taught alongside regression models because it uses a regression-like approach with a non-linear transformation.
- In logistic regression, the dependent variable is discrete (usually 0/1 or True/False).
- The model predicts the probability of the positive class (e.g., probability that Y=1) using the sigmoid (logistic) function to constrain outputs between 0 and 1.
Instead of fitting a straight line, logistic regression fits an S-shaped sigmoid curve – find the formula below:
sigmoid(z) = 1 / (1 + e^(-z))
In the equation above:
z = b0 + b1*x1 + b2*x2 + ... + bn*xn
- sigmoid(z) outputs a value between 0 and 1, interpreted as the probability P(Y=1 | X).
- b0 ... bn are the intercept and coefficients for features x1, x2, ..., xn.
The model is typically trained by maximizing the likelihood (or equivalently minimizing log-loss) rather than least squares. Logistic regression assumes the log-odds of the outcome is linear in X.
For a binary outcome, it outputs a probability, and you decide on a threshold (like 0.5) to classify it as 0 or 1.
Key Characteristics of Logistic Regression
- Simple & Effective: Works well when classes are roughly linearly separable.
- Interpretable Coefficients: Each coefficient reflects the log-odds effect of a feature.
- Classification Only: Not suited for predicting continuous outcomes; best for binary or multiclass classification.
- Extendable to Multiclass: Variants like multinomial and ordinal logistic regression handle multiple or ordered categories.
- Performs Best With Less Feature Correlation: Especially effective on larger datasets where features are not heavily correlated.
Code Snippet
Below is how you might train a logistic regressor for a binary classification (e.g., predict if a student passed an exam (1) or not (0) based on hours studied).
# Sample training data
X_train = [[1], [2], [3], [4]] # hours studied
y_train = [0, 0, 1, 1] # 0 = failed, 1 = passed
# Train Logistic Regression model
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Predict probabilities for a new student who studied 2.5 hours
prob = clf.predict_proba([[2.5]])[0][1]
print("Probability of passing with 2.5 hours of study: %.3f" % prob)
Output
The predict_proba method gives the probability of each class. In this example, the model might output a probability (say ~0.3) for passing with 2.5 hours, which you could compare to a threshold (0.5 by default in predict) to classify as fail (0) or pass (1).
Probability of passing with 2.5 hours of study: 0.500
Real-World Applications of Logistic Regression
Example | Use Case |
Spam Detection | Classifies emails as spam or not spam. |
Medical Diagnosis | Predicts disease presence based on test results. |
Credit Default | Identifies risky borrowers. |
Marketing Response | Predicts if a customer will buy after seeing an ad. |
Logistic regression shines in its simplicity and interpretability (via odds ratios). It’s a great first choice for binary classification and one of the essential regression analysis types in machine learning (albeit for categorical outcomes).
Also Read: Difference Between Linear and Logistic Regression: A Comprehensive Guide for Beginners in 2025
3. Polynomial Regression
Polynomial Regression extends linear regression by adding polynomial terms to the model. It is useful when the relationship between the independent and dependent variables is non-linear (curved) but can be approximated by a polynomial curve.
In essence, you create new features as powers of the original feature(s) and then perform linear regression on the expanded feature set.
For example, a quadratic regression on one feature x would use x2x^2x2 as an additional feature – find the formula below:
y = b0 + b1*x + b2*x^2 + e
- x^2 is a second-degree polynomial term
- b0, b1, b2 are the intercept and coefficients, respectively
- e is the error term
In general, for a polynomial of degree d:
y = b0 + b1*x + b2*x^2 + ... + bd*x^d + e
- bd*x^d represents the highest-order polynomial term
- b0 ... bd are coefficients to be estimated
- e is the error term
This is still a linear model in terms of the coefficients (β’s), but the features are non-linear (powers of x). Polynomial regression can capture curvature by fitting a polynomial line instead of a straight line.
Note that polynomial regression can be done with multiple features, too (including interaction terms), though it quickly increases the number of terms.
Key Characteristics of Polynomial Regression
- Captures Non-Linear Patterns: Increases the polynomial degree to model more complex relationships.
- Risk of Overfitting: Higher-degree polynomials may fit training data too well, leading to wild oscillations.
- Commonly 2nd or 3rd Degree: Quadratic or cubic polynomials are typical for moderate non-linearity.
- Visual Validation: Checking extrapolation (especially at data edges) is crucial because high-degree polynomials can behave unpredictably outside the training range.
- Still a Linear Model: Coefficients are solved with linear methods, but the features are polynomial transformations, making it a “non-linear” fit.
Code Snippet
Here’s an illustration of polynomial regression by fitting a quadratic curve. You’ll use PolynomialFeatures to generate polynomial features and then a linear regression on those:
If you run the code, you will see the coefficients and the prediction for a new input.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# Sample training data (X vs y with a non-linear relationship)
X_train = np.array([[1], [2], [3], [4], [5]]) # e.g., years of experience
y_train = np.array([2, 5, 10, 17, 26]) # e.g., performance metric that grows non-linearly
# Transform features to include polynomial terms up to degree 2 (quadratic)
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X_train) # adds X^2 as a feature
# Fit linear regression on the polynomial features
poly_model = LinearRegression().fit(X_poly, y_train)
print("Learned coefficients:", poly_model.coef_)
print("Learned intercept:", poly_model.intercept_)
# Predict on a new data point (e.g., 6 years of experience)
new_X = np.array([[6]])
new_X_poly = poly.transform(new_X)
pred_y = poly_model.predict(new_X_poly)
print("Predicted performance for 6 years:", pred_y[0])
Output
Depending on floating-point precision, the exact numbers may differ slightly, but they will generally show:
- An intercept close to 1
- A first coefficient near 0 (for the linear term)
- A second coefficient near 1 (for the quadratic term)
- A predicted performance of about 37 for 6 years of experience.
Learned coefficients: [0. 1.]
Learned intercept: 1.0
Predicted performance for 6 years: 37.0
Real-World Applications of Polynomial Regression
Example Scenario |
Description |
Economics – Diminishing Returns | Model diminishing returns (like ad spend vs. sales). |
Growth Curves | Approximate certain growth patterns with a polynomial. |
Physics – Trajectories | Predict projectile motion with a quadratic term. |
Trend Analysis | Fit non-linear trends in data with polynomial terms. |
Polynomial regression is basically performing a non-linear regression in machine learning while still using the efficient linear regression solvers. If the curve is too complex, other methods (like decision trees) might be more appropriate, but polynomial regression is a quick way to try to capture non-linearity.
4. Ridge Regression (L2 Regularization)
Ridge Regression is a linear regression variant that addresses some limitations of ordinary least squares by adding a regularization term (penalty) to the loss function. It is also known as L2 regularization.
The ridge regression minimizes a modified cost function – the formula is given below:
Cost_ridge = Σ(from i=1 to N) [ (y_i - ŷ_i)^2 ] + λ * Σ(from j=1 to p) [ (β_j)^2 ]
In the equation above:
- The first term is the usual sum of squared errors.
- The second term, λ * Σ( (β_j)^2 ), is the L2 penalty.
- λ (lambda) controls how strongly coefficients are shrunk toward zero.
This penalty term shrinks the coefficients towards zero (but unlike Lasso, it never fully zeros them out). By adding this bias, ridge regression can reduce variance at the cost of a bit of bias, helping to prevent overfitting.
When to Use?
- Multicollinearity: Helps control variance when independent variables are highly correlated.
- High-Dimensional Data: Useful when the number of features exceeds (or is comparable to) the number of samples.
- Hyperparameter Tuning: Typically done via cross-validation to find the optimal λ.
Ridge regression is also helpful when the number of features is large relative to the number of data points.
Range of λ:
- λ = 0 ⇒ Reduces to ordinary least squares.
- λ → ∞ ⇒ Drives coefficients toward zero.
Key Characteristics of Ridge Regression
- Keeps All Features: Distributes penalty across coefficients, shrinking them but never zeroing out.
- No Feature Selection: Unlike Lasso, Ridge does not eliminate features entirely.
- Still Linear: The underlying model is linear, only with an added penalty term.
- Improves Predictive Accuracy: Especially effective in high-dimensional settings if λ (the regularization parameter) is tuned properly.
Code Snippet
Using scikit-learn’s Ridge class, you can fit a ridge model. You’ll reuse the polynomial features example but apply ridge to it to see the effect of regularization.
This is similar to linear regression but with a penalty. If you compare ridge_model.coef_ to the earlier linear poly_model.coef_, you’d notice the ridge coefficients are smaller in magnitude (pulled closer to zero). By adjusting alpha, you can increase or decrease this effect. In practice, one would tune alpha to find a sweet spot between bias and variance.
from sklearn.linear_model import Ridge
# Using the polynomial features from earlier example (X_poly, y_train)
ridge_model = Ridge(alpha=1.0) # alpha is λ in sklearn (1.0 is a moderate penalty)
ridge_model.fit(X_poly, y_train)
print("Ridge coefficients:", ridge_model.coef_)
print("Ridge intercept:", ridge_model.intercept_)
Output
Because the polynomial relationship y=1+x2y = 1 + x^2y=1+x2 already fits the data perfectly, Ridge introduces only a small numeric difference from the exact solution of [0,1][0, 1][0,1] with intercept 111.
In other words, you get nearly the same fit as the plain polynomial regression but with tiny floating-point variations.
Ridge coefficients: [-3.60822483e-16 1.00000000e+00]
Ridge intercept: 1.0
Real-World Applications of Ridge Regression
Example Scenario |
Description |
Portfolio Risk Modeling | Predict returns from correlated indicators. |
Medical Data (Multi-omics) | Model disease progression from many correlated genomic features. |
Manufacturing | Predict quality from correlated process parameters. |
General Regularized Prediction | Good when p >> n to reduce overfitting while keeping all features. |
In summary, ridge regression introduces bias to achieve lower variance in predictions – a desirable trade-off in many practical machine learning regression problems.
5. Lasso Regression (L1 Regularization)
Lasso Regression (Least Absolute Shrinkage and Selection Operator) is another regularized version of linear regression, but it uses an L1 penalty instead of L2.
Here’s the cost function for Lasso:
Cost_lasso = Σ(from i=1 to N) [ (y_i - ŷ_i)^2 ] + λ * Σ(from j=1 to p) [ |β_j| ]
- The first term is the sum of squared errors.
- The second term, λ * Σ( |β_j| ), is the L1 penalty that drives some coefficients to zero.
This has a special property: it can drive some coefficients exactly to zero when λ is sufficiently large, effectively performing feature selection. Lasso regression thus not only helps with overfitting but can produce a more interpretable model by eliminating irrelevant features.
When to Use?
- Feature Importance: Ideal when you suspect only a few predictors matter; Lasso drives irrelevant feature coefficients to zero.
- High-Dimensional Data: Reduces overfitting by regularizing many predictors and producing a sparse model.
- Automatic Feature Selection: Simultaneously performs regression and identifies the most important features.
- Caveat with Correlated Features: Lasso may arbitrarily choose one feature from a correlated group and drop others; consider Elastic Net to mitigate this issue.
Key Characteristics of Lasso Regression
- Coefficient Zeroing: Unlike Ridge, large penalty values can drive some coefficients all the way to zero.
- λ Controls Sparsity: At λ = 0, Lasso is just ordinary least squares. As λ increases, the model becomes sparser, shutting off more features.
- Linear Model: Same assumptions as Ridge about linear relationships, but with L1 penalty instead of L2.
Code Snippet
Using scikit-learn’s Lasso class is straightforward. You’ll apply it to the same polynomial example for illustration.
After fitting, you may find that some coefficients are exactly 0. For example, if we had many polynomial terms, Lasso might zero out the higher-degree ones if they’re not contributing much.
In this small example with just 2 features [x and x^2], it might keep both, but with different values than ridge or OLS.
from sklearn.linear_model import Lasso
lasso_model = Lasso(alpha=0.5) # alpha is λ, here chosen moderately
lasso_model.fit(X_poly, y_train)
print("Lasso coefficients:", lasso_model.coef_)
print("Lasso intercept:", lasso_model.intercept_)
Output
Because the true function y=1+x2y = 1 + x^2y=1+x2 already fits the data perfectly, Lasso finds an intercept close to 1 and a quadratic coefficient close to 1, with no need for a linear term (coefficient ~ 0). Any small differences from exact values are just floating-point or solver tolerance effects.
Lasso coefficients: [0. 1.]
Lasso intercept: 1.0
Real-World Applications of Lasso Regression
Example Scenario |
Description |
Sparse Signal Recovery | Identify relevant signals/genes by zeroing out others. |
Finance – Key Indicators | Pick top indicators from hundreds for stock price modeling. |
Marketing – Feature Selection | Select main drivers of customer spend from many features. |
Environment Modeling | Identify key sensors for air quality from wide sensor data. |
6. Elastic Net Regression
Elastic Net Regression combines the penalties of ridge and lasso to get the benefits of both. Its penalty is a mix of L1 and L2 – the formula is listed below:
Cost_elastic_net = Σ (y_i - ŷ_i)^2 + λ [ α Σ|β_j| + (1 - α) Σ(β_j)^2 ]
In the equation above:
- The first term is the sum of squared errors.
- The second term is a mix of L1 (Lasso) and L2 (Ridge) penalties:
- α=1 ⇒ pure L1 penalty
- α=0 ⇒ pure L2 penalty
In practice, one chooses a fixed α between 0 and 1 (e.g., 0.5 for an even mix) and then tunes λ. Elastic Net thus simultaneously performs coefficient shrinkage and can zero out some coefficients.
When to Use?
- Correlated Features: If multiple predictors are highly correlated, Elastic Net avoids the arbitrary selection that Lasso might do and keeps relevant groups together.
- Balancing Ridge and Lasso: When pure Ridge doesn’t provide enough feature selection and pure Lasso is too aggressive with correlated features, Elastic Net offers a middle ground.
- High-Dimensional Data: Particularly helpful when there are more predictors than observations, as the mixed penalty provides both regularization and some sparsity.
Key Characteristics of Elastic Net Regression
- Hybrid Regularization: Combines L1 (Lasso) and L2 (Ridge) penalties, balancing feature selection and coefficient shrinkage.
- Two Hyperparameters: α (mixing ratio) and λ (overall penalty strength), both typically tuned via cross-validation.
- Stability with Correlated Features: More stable than Lasso alone when dealing with highly correlated predictors (0 < α < 1 often yields superior accuracy).
- Comparable to Ridge + Sparse Like Lasso: Achieves performance close to Ridge while still zeroing out some coefficients.
Code Snippet
Scikit-learn’s ElasticNet allows setting both α (l1_ratio in sklearn) and λ (alpha in sklearn).
In this example, alpha=0.1 is a moderate regularization strength and l1_ratio=0.5 gives equal weight to L1 and L2 penalties. The resulting coefficients will be somewhere between ridge and lasso in effect.
Let’s demonstrate:
from sklearn.linear_model import ElasticNet
# ElasticNet with 50% L1, 50% L2 (l1_ratio=0.5)
en_model = ElasticNet(alpha=0.1, l1_ratio=0.5) # alpha is overall strength (λ), l1_ratio is mix
en_model.fit(X_poly, y_train)
print("Elastic Net coefficients:", en_model.coef_)
print("Elastic Net intercept:", en_model.intercept_)
Output
Because the true function is y=1+x2y = 1 + x^2y=1+x2, the model typically learns:
- An intercept near 1
- A near-zero linear term
- A quadratic term near 1
You might see tiny numerical deviations (e.g., 0.9999) due to floating-point precision and regularization.
Elastic Net coefficients: [0. 1.]
Elastic Net intercept: 1.0
Real-World Applications of Elastic Net
Example Scenario |
Description |
Genetics | Keep or drop correlated gene groups together. |
Economics | Group correlated indicators (e.g., inflation, interest). |
Retail | Retain or discard correlated store features. |
General high-dimensional data | Good compromise of shrinkage & selection when p >> n. |
7. Stepwise Regression
Stepwise Regression is a variable selection method rather than a distinct regression model. It refers to an iterative procedure of adding or removing features from a regression model based on certain criteria (like p-values, AIC, BIC, or cross-validation performance). The goal is to arrive at a compact model with a subset of features that provides the best fit.
There are two main approaches:
- Forward selection: It starts with no features, then adds the most significant features one by one
- Backward elimination: It starts with all candidate features, then removes the least significant one by one
A combination of both (adding and removing) is often called stepwise (or bidirectional) selection.
When to Use?
- Model Simplification: Helpful if you have many features and want to narrow down to the most relevant ones.
- Exploratory or Initial Model Building: Commonly used to identify a subset of significant predictors before moving to more complex methods.
- Automatic Feature Selection: Variables are added or removed based on statistical tests or information criteria, reducing subjective bias.
Key Characteristics of Stepwise Regression
- Subset of Features: Produces a standard regression model (e.g., linear, logistic) but with a reduced set of predictors.
- Flexible Application: Can be used with various regression methods.
- Overfitting Risk: Involves repeated tests on the same data, so it needs validation (e.g., cross-validation, AIC/BIC) to avoid overfitting.
- Modern Alternatives: Regularization (Ridge/Lasso) or tree-based approaches are often preferred, but stepwise remains popular for simplicity and interpretability.
Code Snippet
There isn’t a built-in scikit-learn function named “stepwise”, but one can implement forward or backward selection. Sklearn’s SequentialFeatureSelector can do this.
This will select 5 best features (you can adjust that or use cross-validation to decide when to stop).
- Backward elimination would use direction='backward'.
- The output selected_feats gives the indices of features chosen.
- In R, one has step() function for stepwise; in Python, one might use statsmodels or a DIY loop, but the idea is the same.
from sklearn.feature_selection import SequentialFeatureSelector
# Assume X_train is a dataframe or array with many features
lr = LinearRegression()
sfs_forward = SequentialFeatureSelector(lr, n_features_to_select=5, direction='forward')
sfs_forward.fit(X_train, y_train)
selected_feats = sfs_forward.get_support(indices=True)
print("Selected feature indices:", selected_feats)
Output
A typical output — assuming X_train has multiple features — could look like this:
Selected feature indices: [0 2 4 7 9]
The exact indices depend on your dataset. The array shows which feature columns (by index) were chosen when selecting 5 features in forward selection mode.
Real-World Applications of Stepwise Regression
Example Scenario |
Description |
Medical Research (Predictors) | Narrow down from many health factors. |
Economic Modeling | Find a small subset of indicators for GDP. |
Academic Research | Identify top variables among many measured. |
Initial Feature Screening | Get a quick feature subset before advanced models. |
Remember that stepwise methods should be validated on a separate test set to ensure the selected features generalize. They provide one way to handle different types of regression analysis by focusing on the most impactful predictors.
8. Decision Tree Regression
Decision Tree Regression in machine learning is a non-parametric model that predicts a continuous value by learning decision rules from the data.
It builds a binary tree structure: at each node of the tree, the data is split based on a feature and a threshold value, such that the target values in each split are as homogeneous as possible.
This splitting continues recursively until a stopping criterion is met (e.g., minimum number of samples in a leaf or maximum tree depth). The leaf nodes of the tree contain a prediction value (often the mean of the target values that fall in that leaf).
In essence, a decision tree regression partitions the feature space into rectangular regions and fits a simple model (constant value) in each region. The result is a piecewise constant approximation to the target function.
Unlike linear models, decision trees can capture nonlinear interactions between features easily by their branching structure.
Key Characteristics of Decision Tree Regression
- Non-linear and Non-parametric: They do not assume any functional form; given enough depth, the model can learn arbitrary relationships.
- Interpretability: You can visualize the tree to understand how decisions are made (which features and thresholds). This is great for explaining models.
- Prone to Overfitting: If grown deep, trees can overfit (memorize the training data). Pruning or setting depth limits is important.
- Handling of Features: Trees can handle both numerical and categorical features (there is no need for one-hot encoding if the implementation supports categorical splits). They also implicitly handle feature interactions.
Decision tree regression will exactly fit any data if not constrained, so typically, one limits depth or requires a minimum number of samples per leaf to prevent too many splits.
Code Snippet
Here, you will limit max_depth to 3 to prevent an overly complex tree. The tree will find splits on the age feature to partition the income values. The prediction for age 18 would fall into one of the learned leaf intervals, and the average income for that interval would be output.
from sklearn.tree import DecisionTreeRegressor
# Sample data: predicting y from X (where relationship may be non-linear)
X_train = [[5], [10], [17], [20], [25]] # e.g., years of age
y_train = [100, 150, 170, 160, 180] # e.g., some income that rises then dips then rises
tree = DecisionTreeRegressor(max_depth=3, random_state=42)
tree.fit(X_train, y_train)
# Make a prediction
print("Predicted value for 18:", tree.predict([[18]])[0])
Output
Because 18 ends up in the same leaf as (20 → 160) and (25 → 180) or a similar grouping, the tree’s average value for that leaf is around 170. The exact partition can vary slightly depending on the data and settings, but you’ll likely see a result close to 170.
Predicted value for 18: 170.0
Real-World Applications of Decision Tree Regression
Example Scenario |
Description |
House Price Prediction (Rules) | Split houses by location, size, etc. for a final price leaf. |
Medicine – Dosage Effect | Split on dose and age for predicted response. |
Manufacturing Quality | Split on sensor readings for quality. |
Customer Value Prediction | Segments customers into leaves for value. |
9. Random Forest Regression
Random Forest Regression is an ensemble learning method that builds on decision trees. The idea is to create a large number of decision trees (a forest) and aggregate their predictions (typically by averaging for regression).
Each individual tree is trained on a random subset of the data and/or features (hence "random"). Specifically, random forests use the following:
- Bootstrap Aggregation (Bagging): Each tree gets a bootstrap sample (random sample with replacement) of the training data.
- Feature Randomness: At each split, a random subset of features (rather than all features) is considered for splitting.
These two sources of randomness make the trees diverse. While any single tree might overfit, the average of many overfitting trees can significantly reduce variance. Random forests thus achieve better generalization than a single tree while maintaining the ability to handle non-linear relationships.
Key Characteristics of Random Forest Regression
- High Accuracy: Often one of the most accurate out-of-the-box regressors because it can capture complex relationships and averages out noise.
- Robustness: Less overfitting than individual deep trees, though very deep forests can still overfit some. You can tune the number of trees and depth.
- Feature Importance: Random forests can give each feature an importance score (based on how much it reduces error across splits), which is useful for insight.
- Less Interpretable: Because it combines many trees, the overall model is less interpretable than a single decision tree. Partial dependence plots or feature importances are used to interpret it.
- Handles Non-linearity and Interactions: It does so very well since each tree and the ensemble can model complex patterns.
Code Snippet
Here, you create a forest of 100 trees (common default). max_depth=3 to keep each tree small for interpretability (in practice, you might let them grow deeper or until leaf size is minimal).
The prediction for 18 will be an average of 100 different decision tree predictions, yielding a more stable result than an individual tree.
from sklearn.ensemble import RandomForestRegressor
# Continuing with the previous example data
rf = RandomForestRegressor(n_estimators=100, max_depth=3, random_state=42)
rf.fit(X_train, y_train)
print("Random Forest prediction for 18:", rf.predict([[18]])[0])
Output
A common prediction when running this random forest code is around:
Random Forest prediction for 18: 168.4
The exact number can vary slightly because of the following reasons:
- The forest averages predictions from multiple decision trees (100 by default).
- Each tree’s random sample (bootstrap) and random feature selection can produce different splits.
- Setting a different random_state or changing hyperparameters will alter the estimate.
Real-World Applications of Random Forest Regression
Example Scenario |
Description |
Stock Market Prediction | Average many decision trees for stable forecasts. |
Energy Load Forecasting | Capture complex weather-demand interactions. |
Predicting Equipment Failure Time | Average multiple trees for robust time-to-failure. |
General Tabular Data Regression | Often a top choice for structured data. |
Random forests are a go-to machine learning regression algorithm when you want good performance with minimal tuning. They handle a variety of data types and are resilient to outliers and scaling issues (no need for normalization typically). The main downsides are model size (hundreds of trees can be large) and interpretability.
Also Read: Random Forest Algorithm: When to Use & How to Use? [With Pros & Cons]
10. Support Vector Regression (SVR)
Support Vector Regression applies the principles of Support Vector Machines (SVM) to regression problems. The idea is to find a function (e.g., a hyperplane in feature space) that deviates from the actual targets by, at most, a certain epsilon (ε) for each training point and is as flat as possible.
In SVR, you specify an ε-insensitive zone: if a prediction is within ε of the true value, the model does not incur a loss for that point. Only points outside this margin (where the prediction error is larger than ε) will contribute to the loss – those are called support vectors.
Mathematically, for linear SVR, we solve an optimization problem:
Minimize: (1/2) * ||w||^2
Subject to: |y_i - (w · x_i + b)| ≤ ε for all i
Intuitively, it tries to fit a tube of width 2ε around the data. A wider tube (large ε) allows more error but fewer support vectors (so a simpler model), while a narrower tube forces precision (potentially more complex model).
Kernel tricks can be used to perform non-linear regression by mapping features to higher-dimensional spaces, similar to SVM classification.
Key Characteristics of SVR
- Robustness to Outliers: By ignoring errors within ε, SVR can be robust to some noise (small deviations don’t matter).
- Flexibility with Kernels: You can use Gaussian (RBF) kernel, polynomial kernel, etc., to capture non-linear relationships much like in SVM classification. This makes SVR very powerful for certain pattern-fitting tasks.
- Complexity: Training SVR can be slower than linear regression or tree models, especially on large datasets, because it involves solving a quadratic programming problem. SVR is more commonly used on smaller to medium datasets.
- Parameters to Tune: ε (margin width), C (regularization parameter controlling trade-off between flatness and amount of error allowed outside ε), and kernel parameters (if using a kernel).
- Not as Scalable: For very large datasets (n in tens of thousands or more), tree ensembles or linear models are often preferred. But SVR shines for moderate data with complex relationships.
Code Snippet
In this example, the RBF kernel SVR will fit a smooth curve through the points within the tolerance ε. The predicted value for 2.5 will lie on that learned curve.
from sklearn.svm import SVR
# Sample non-linear data (for demonstration)
X_train = [[0], [1], [2], [3], [4], [5]]
y_train = [0.5, 2.2, 3.9, 5.1, 4.9, 6.8] # somewhat nonlinear progression
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)
# Predict on a new point
print("SVR prediction for 2.5:", svr.predict([[2.5]])[0])
Output
A typical prediction when running this SVR code is around:
SVR prediction for 2.5: 4.3
The exact number can vary due to the RBF kernel’s smoothing and default hyperparameters, but you’ll generally see a value between 3.9 (X=2) and 5.1 (X=3).
Real-World Applications of SVR
Example Scenario |
Description |
Financial Time Series | Fit complex patterns with RBF kernel. |
Temperature Prediction | Model non-linear relationships with little data. |
Engineering – Smoothing | Tolerate small errors within ε for a smooth curve. |
Small Dataset Regression | Capture complexity when data is limited. |
SVR is one of the more advanced regression models in machine learning. If tuned well, it offers a good balance between bias and variance, particularly for smaller datasets with non-linear relationships.
You can also check out upGrad’s free tutorial, Support Vector Machine (SVM) for Anomaly Detection. Learn how it works and its step-by-step implementation.
11. Principal Component Regression (PCR)
Principal Component Regression is a technique that combines Principal Component Analysis (PCA) with linear regression. The main idea is to address multicollinearity and high dimensionality by first transforming the original features into a smaller set of principal components (which are uncorrelated) and then using those components as predictors in a regression model.
Steps in PCR:
- Perform PCA on the feature matrix X. PCA finds new orthogonal dimensions (principal components) that capture the maximum variance in the data. You usually select the top k components that explain a large portion of variance.
- Use these k principal component scores as the independent variables in a linear regression to predict y.
- The regression yields coefficients for those components. If needed, you can interpret them in terms of the original variables by reversing the PCA transformation.
When to Use?
- Correlated Features: This is ideal if your predictors exhibit strong multicollinearity since PCR replaces original features with orthogonal principal components.
- High-Dimensional Data: Frequently applied in chemometrics or genomics, where the number of features can be very large.
- Noise Reduction: By discarding lower-variance components, PCR filters out directions that don’t contribute meaningfully to predicting y.
- Enhanced Prediction: Restricting the model to a handful of main components often prevents overfitting and boosts generalization.
Key Characteristics of PCR
- Unsupervised PCA: Components are derived without regard to y, which can occasionally lead to discarding potentially informative variance.
- Component Selection: Choosing how many principal components to keep is often guided by cross-validation to optimize predictive performance.
- Stable in High Dimensions: Helps mitigate overfitting and handles collinearity, somewhat akin to Ridge in reducing model complexity.
- Reduced Interpretability: Principal components are linear combinations of the original features, making domain interpretation more challenging.
Code Snippet
In this snippet, you will reduce 100 original features to 10 components and then fit a regression on those 10. One would choose 10 by checking how much variance those components explain or via CV.
The pca.components_ attribute can tell us which original features contribute to each component, but interpretation is not as straightforward as a normal linear model.
from sklearn.decomposition import PCA
# Suppose X_train has 100 features, and we suspect many are correlated/redundant
pca = PCA(n_components=10) # reduce to 10 components
X_train_reduced = pca.fit_transform(X_train)
# Now X_train_reduced has 10 features (principal components). Do linear regression on these.
lr = LinearRegression()
lr.fit(X_train_reduced, y_train)
# To predict on new data, remember to transform it through the same PCA:
X_test_reduced = pca.transform(X_test)
predictions = lr.predict(X_test_reduced)
Output
Below is a concise example of what your console might display:
- X_train_reduced shape: Confirms the 10 principal components for each of your training samples.
- Coefficients & Intercept: The linear regression parameters learned on the PCA-transformed data.
- Predictions on X_test: One or more predicted values for your test set after applying the same PCA transformation.
X_train_reduced shape: (200, 10)
Coefficients: [ 0.12 -0.05 0.07 -0.01 0.03 0.02 0.09 -0.08 0.01 0.04]
Intercept: 3.2
Predictions on X_test:
[10.81 9.95 11.42 8.77 ... ]
Real-World Applications of Principal Component Regression
Example Scenario |
Description |
Chemometrics | Predict concentration from many correlated spectral features. |
Image Regression | Reduce dimensionality (eigenfaces) before regression. |
Economic Indices | Combine many correlated indicators into fewer components. |
Environmental Data | Compress redundant sensors, then predict outcome. |
PCR is valuable when you have more features than observations or highly correlated inputs. By focusing on the major components of variation in the data, it trades a bit of optimal predictive power for a simpler, more robust model.
Also Read: PCA in Machine Learning: Assumptions, Steps to Apply & Applications
12. Partial Least Squares (PLS) Regression
Partial Least Squares Regression (PLS) is another technique for dealing with high-dimensional, collinear data. It is somewhat similar to PCR but with an important twist: PLS is a supervised method.
It finds new features (components) that are linear combinations of the original predictors while also taking into account the response variable y in determining those components. In other words, PLS tries to find directions in the feature space that have high covariance with the target. This often makes PLS more effective than PCR in predictive tasks because it doesn’t ignore y when reducing dimensions.
PLS produces a set of components (also called latent vectors) with two sets of weights: one for transforming X and one for y (for multivariate Y, though in simple regression Y is one-dimensional). You choose the number of components to keep similar to PCR.
When to Use?
- High-Dimensional or Noisy Data: Similar to PCR, PLS is ideal when many predictors are correlated or contain noise.
- Strong Collinearity: Especially helpful if predictors are highly collinear, as PLS finds components that best relate to y.
- Focus on Predictive Variance: Unlike PCR, PLS components are chosen by how strongly they correlate with y, reducing the risk of discarding relevant variance.
- Used in Chemometrics & Bioinformatics: Common where datasets have many potential predictors and the goal is accurate, interpretable prediction.
Key Characteristics of PLS
- Supervised Dimensionality Reduction: Typically outperforms PCR since it extracts components based on correlation with y.
- Multiple Responses: Can handle multiple dependent variables, jointly explaining both X and Y variance.
- Component Selection: Requires choosing the right number of components (too few underfits, too many overfits), often done via cross-validation.
- Interpretation Challenges: Components are linear combinations of original features, which can be difficult to map back to domain factors.
- Niche in Scientific Data: For purely predictive tasks, other methods (e.g., ridge, random forests) may suffice, but PLS is valuable in scientific analysis where both prediction and factor identification matter.
Code Snippet
This code will compute 10 PLS components and use them to fit the regression. Under the hood, it’s finding weight vectors for X and y such that covariance is maximized. You can inspect pls.x_weights_ or pls.x_loadings_ to see how original features contribute to components.
from sklearn.cross_decomposition import PLSRegression
pls = PLSRegression(n_components=10)
pls.fit(X_train, y_train)
# After fitting, we can predict normally
y_pred = pls.predict(X_test)
Output
Below is a concise example of what you might see after running this code (exact numbers will vary depending on your data):
[10.24 9.56 11.43 ...]
Here, [10.24 9.56 11.43 ...] represents the predicted y values for the samples in your X_test. Since PLSRegression is trained with 10 components, it uses those components to produce these final predictions.
Real-World Applications of PLS Regression
Example Scenario |
Description |
Chemistry and Spectroscopy | Focus on variations relevant to property of interest. |
Genomics (QTL analysis) | Distill many genetic markers to latent factors correlated with phenotype. |
Manufacturing | Identify composite process factors that drive quality. |
Social Science | Combine correlated socioeconomic indicators that best predict an outcome. |
PLS regression is a powerful method when you're in the realm of “small n, large p” (few observations, many features) and want to reduce dimensionality in a way that’s oriented toward prediction. It fills an important niche between pure feature extraction (like PCA) and pure regression.
13. Bayesian Regression
Bayesian Regression refers to a family of regression techniques that incorporate Bayesian principles into the modeling. In contrast to classical regression, which finds single best-fit parameters (point estimates), Bayesian regression treats the model parameters (coefficients) as random variables with prior distributions. It produces a posterior distribution for these parameters given the data.
This means instead of one set of coefficients, you get a distribution (mean and uncertainty) for each coefficient, and predictions are distributions as well (with credible intervals).
One common approach is Bayesian Linear Regression: assume a prior (often Gaussian) for the coefficients β and maybe for noise variance, then update this prior with the data (likelihood) to get a posterior.
The resulting posterior can often be derived in closed-form for linear regression with conjugate priors (Gaussian prior + Gaussian likelihood yields Gaussian posterior, which is the basis of Bayesian Ridge in scikit-learn). The prediction is typically the mean of the posterior predictive distribution, and you also get uncertainty (variance).
When to Use?
- Prior Knowledge: Ideal if you have insights about coefficient values that can be encoded as a prior.
- Uncertainty Estimates: Provides predictive distributions and confidence measures, critical for fields like medicine and risk analysis.
- Small Data: Priors serve as regularizers, similar to Ridge regression, preventing overfitting.
- Sequential Updates: New data can update posterior distributions, making Bayesian methods well-suited for online or continuous learning.
Key Characteristics of Bayesian Regression
- Computational Complexity: More demanding than classical regression, especially for non-linear models.
- Distribution of Models: Produces a posterior distribution over parameters rather than a single point estimate.
- Priors Matter: Choice of priors can significantly affect results, especially with limited data; generic Gaussian priors often act like regularizers.
- Overfitting Protection: By averaging over parameter uncertainty, Bayesian regression can yield more stable estimates, particularly when data is scarce or features are correlated.
Code Snippet
Scikit-learn offers BayesianRidge, which is a Bayesian version of linear regression with a Gaussian prior on coefficients. BayesianRidge also estimates the noise variance and includes automatic tuning of priors via evidence maximization.
The output shows the mean of the coefficient posterior, from which you can derive uncertainty (the coefficients' covariance matrix is sigma_).
from sklearn.linear_model import BayesianRidge
bayes_ridge = BayesianRidge()
bayes_ridge.fit(X_train, y_train)
print("Coefficients (mean of posterior):", bayes_ridge.coef_)
print("Coefficient uncertainties (std of posterior):", np.sqrt(bayes_ridge.sigma_))
Output
Below is an example of what you might see (the exact numbers depend on your data):
Coefficients (mean of posterior): [1.02]
Coefficient uncertainties (std of posterior): [0.15]
- Coefficients (mean of posterior): The BayesianRidge estimate of each feature’s effect, averaged over the posterior distribution.
- Coefficient uncertainties (std of posterior): The standard deviations showing how confident the model is about each coefficient.
Real-World Applications of Bayesian Regression
Example Scenario |
Description |
Medical Prediction (with uncertainty) | Provide predictions and confidence intervals. |
Econometrics | Combine prior theory with data for parameter distributions. |
Engineering - Calibration | Incorporate prior knowledge for model parameters. |
Adaptive Modeling | Update posterior with new data for real-time personalization. |
Bayesian regression provides a probabilistic framework for regression, yielding richer information than classic point estimation. It ensures that you understand the uncertainty in predictions, which is crucial for many real-world applications where decisions depend on confidence in the results.
Also Read: Bayesian Linear Regression: What is, Function & Real Life Applications in 2024
14. Quantile Regression
Quantile Regression is a type of regression that estimates the conditional quantile (e.g., median or 90th percentile) of the response variable as a function of the predictors, instead of the mean.
Unlike ordinary least squares, which minimizes squared error (and thus focuses on the mean), quantile regression minimizes the sum of absolute errors weighted asymmetrically to target a specific quantile.
For example, Median Regression (0.5 quantile) minimizes absolute deviations (50% of points above, 50% below, akin to least absolute deviations). The 0.9 quantile regression would ensure ~90% of the residuals are negative and 10% positive (focusing on the upper end of distribution).
The quantile loss function for quantile q is: for residual r, loss = q*r if r >= 0, and = (q-1)*r if r < 0.
This creates a tilted absolute loss that penalizes over-predictions vs under-predictions differently to hit the desired quantile. Essentially, quantile regression gives a more complete view of the relationship between X and Y by modeling different points of the distribution of Y.
When to Use?
- Distribution Focus: Ideal if you need to predict specific percentiles (e.g., median or 90th percentile) rather than just the mean.
- Heteroscedastic or Non-Symmetric Errors: Captures changing variance or skew in the data better than mean-based regression.
- Risk Management: Essential in finance and insurance for predicting high-loss quantiles (Value-at-Risk).
- Outlier Robustness: Median regression (quantile=0.5) focuses on absolute deviations, making it less sensitive to outliers.
Key Characteristics of Quantile Regression
- No Distribution Assumption: Non-parametric approach, making it useful when OLS assumptions don’t hold.
- Multiple Quantiles: Each quantile is fitted independently; you can create a “fan of lines” for a more comprehensive view of the response distribution.
- Linear Programming: Often solved via linear programming methods; implementations are available in R (quantreg) and Python (statsmodels).
- Robust to Outliers: Median regression (quantile=0.5) is less sensitive to extreme values than mean-based OLS.
- Quantile-Specific Interpretation: A coefficient reflects how a predictor affects a chosen quantile of the outcome (e.g., the 0.9 quantile for high-end prices).
Code Snippet
Scikit-learn doesn’t have a direct quantile regression solver aside from using QuantileRegressor (introduced in v1.1) or using ensembles with quantile loss. For illustration, we’ve used QuantileRegressor for median.
This will fit a median regression (0.5 quantile).
- If you want the 90th percentile, use quantile=0.9. (QuantileRegressor solves via linear programming and can also include an alpha for ridge-like penalty if needed).
- Alternatively, one can use an ensemble method, such as GradientBoostingRegressor with loss='quantile' and alpha=0.9, to get predictions for the 90th quantile.
# This snippet assumes sklearn >= 1.1 for QuantileRegressor
from sklearn.linear_model import QuantileRegressor
median_reg = QuantileRegressor(quantile=0.5, alpha=0) # alpha=0 for no regularization
median_reg.fit(X_train, y_train)
print("Coefficients for median regression:", median_reg.coef_)
Output
A typical console output (assuming a single feature) might be:
Coefficients for median regression: [1.]
If you had multiple features, you’d see something like [1.0 -0.2 0.5 ...]. Exact values depend on your dataset, but the array represents the slope estimates for the specified quantile (in this case, the median).
Real-World Applications of Quantile Regression:
Example Scenario |
Description |
Housing Market Analysis | Predict 10th/90th percentile house prices. |
Weather and Climate | Model extreme rainfall/temperature quantiles. |
Traffic and Travel Time | Estimate upper travel time bounds for planning. |
Finance – Value at Risk | Directly model high quantile losses for risk. |
Quantile regression adds another dimension to understanding predictive relationships by not restricting to the mean outcome. It is a powerful tool when distributions are skewed, have outliers, or when different quantiles exhibit different relationships with predictors.
15. Poisson Regression
Poisson Regression is a type of generalized linear model (GLM) used for modeling count data in situations where the response variable is a count (0, 1, 2, ...) that often follows a Poisson distribution.
It is appropriate when the counts are assumed to occur independently, and the mean of the distribution equals its variance (a property of Poisson, though this can be relaxed later).
Commonly, Poisson regression models the log of the expected count as a linear combination of features:
log(E[Y | X]) = b0 + b1*x1 + ... + bp*xp
Exponentiating both sides:
E[Y | X] = exp(b0 + b1*x1 + ... + bp*xp)
- The log link ensures E[Y | X] is always positive.
- b0, b1, ..., bp are the coefficients for the predictor variables x1, x2, ..., xp.
The model is typically fitted by maximum likelihood (equivalent to minimizing deviance for GLM). Poisson regression assumes the conditional distribution of Y given X is Poisson, which implies variance = mean for those counts.
Poisson might not fit well if the data show overdispersion (variance > mean). Then, variants like quasi-Poisson or Negative Binomial can be used.
When to Use?
- Count Data: Best for modeling event counts (visits, accidents, calls) in a set time or space.
- Low or Moderate Counts: Works well if there isn’t excessive overdispersion (variance much larger than mean).
- Exposure Offsets: Can handle varying exposure (e.g., per 1,000 people, per hour) through offset terms.
- Discrete Outcomes: Ideal if the target variable is non-negative integer counts.
Key Characteristics of Poisson Regression
- Log-Scale Modeling: Predictions remain non-negative because the expected count is modeled with an exponential link.
- Limitations with Zeros & Overdispersion: Excess zeros or variance require adjustments like zero-inflated or negative binomial models.
- Count Regression Family: Poisson is one member of a broader set of models for discrete, non-negative outcomes.
Code Snippet
Python’s statsmodels can fit GLMs including Poisson.
In scikit-learn, you can also use PoissonRegressor for a more machine-learning API approach, which implements Poisson regression via gradient descent.
import statsmodels.api as sm
# Assume X_train is a 2D array of features, y_train are count outcomes
X_train_sm = sm.add_constant(X_train) # add intercept term
poisson_model = sm.GLM(y_train, X_train_sm, family=sm.families.Poisson())
poisson_results = poisson_model.fit()
print(poisson_results.summary())
Output
Below is an example of the Poisson regression summary you might see (details vary with your data):
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y_train No. Observations: 100
Model: GLM Df Residuals: 98
Model Family: Poisson Df Model: 1
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -220.3045
Date: Thu, 01 Jan 2025 Deviance: 45.6322
Time: 00:00:00 Pearson chi2: 44.581
No. Iterations: 4 Pseudo R-squ. (CS): 0.2183
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -2.1056 0.233 -9.042 0.000 -2.563 -1.648
x1 0.5076 0.093 5.464 0.000 0.325 0.690
==============================================================================
Please Note:
- Model Family: Poisson, indicating a count model.
- Log-Likelihood: Measure of model fit (higher is generally better).
- Coefficients: The effect of each predictor on the log of the outcome’s expected value (the Poisson link function).
- z and P>|z|: Tests if each coefficient significantly differs from zero.
- [0.025, 0.975]: The 95% confidence interval for each coefficient.
Real-World Applications of Poisson Regression:
Example Scenario |
Description |
Public Health – Disease Incidence | Model disease counts based on risk factors. |
Insurance – Claim Counts | Predict number of claims per policy. |
Call Center Volume | Forecast call counts per hour. |
Web Analytics | Count visits or clicks in time intervals. |
Also Read: Generalized Linear Models (GLM): Applications, Interpretation, and Challenges
16. Cox Regression (Proportional Hazards Model)
Cox Regression, or Cox Proportional Hazards Model, is a regression technique used for survival analysis (time-to-event data). Unlike previous regressions, which predict a numeric value directly, Cox regression models the hazard function – essentially the instantaneous risk of the event occurring at time t, given that it hasn’t occurred before t, as a function of covariates.
It is a semi-parametric model: it doesn’t assume a particular baseline hazard function form, but it assumes the effect of covariates is multiplicative on the hazard and constant over time (hence “proportional hazards”).
Here's the formula:
h(t | X) = h0(t) * exp( b1*x1 + ... + bp*xp )
In the equation above:
- h0(t) is the baseline hazard (covariates = 0).
- b1, ..., bp are coefficients for the features x1, ..., xp.
- exp(b_j) is the hazard ratio for x_j; a value > 1 implies increased hazard, < 1 implies decreased hazard.
Cox regression is often used to estimate these hazard ratios for factors while accounting for censoring (some subjects’ events not observed within study time).
When to Use?
- Time-to-Event Data: Best for analyzing how covariates affect the timing of an event (death, machine failure, churn).
- Presence of Censoring: Suited for data with incomplete observations (individuals or items still “alive” at study’s end).
- Wide Application: Extensively adopted in clinical trials, reliability engineering, and customer analytics where the duration until an event matters.
Key Characteristics of Cox Regression
- Indirect Timeline: Doesn’t directly predict event times but enables hazard comparisons and survival curve derivation.
- Censored Data: Gracefully accommodates records where the event hasn’t yet occurred by the study’s end.
- Proportional Hazards Assumption: Covariate effects are assumed constant over time; extensions or stratification are needed if this assumption breaks.
- Partial Likelihood Estimation: Simplifies parameter estimation by canceling out the baseline hazard.
- Hazard Ratios: Each coefficient β\betaβ translates to exp(β)\exp(\beta)exp(β); a ratio of 2.0 doubles the instantaneous risk for subjects with that covariate.
Code Snippet
This code illustrates how to model survival data with Cox Regression:
- Survival Data: We have a time column (how long before event/censoring) and an event column (1 = event happened, 0 = censored).
- CoxPHFitter: Fits a proportional hazards model.
- Summary: Shows coefficients, standard errors, and confidence intervals.
from lifelines import CoxPHFitter
import pandas as pd
# Sample dataset with survival times and covariates
data = pd.DataFrame({
'time': [5, 8, 12, 3, 15], # Survival time
'event': [1, 1, 0, 1, 0], # Event (1) or censored (0)
'age': [45, 50, 60, 35, 55],
'treatment': [1, 0, 1, 1, 0]
})
cph = CoxPHFitter()
cph.fit(data, duration_col='time', event_col='event')
cph.print_summary()
Output
When you run this code, you’ll see a table describing model coefficients, including hazard ratios:
- coef: Estimated effect of the covariate on the log hazard.
- exp(coef): Hazard ratio (values > 1 indicate higher hazard, < 1 indicate lower hazard).
- se(coef): Standard error for each coefficient.
- Confidence Intervals: Range in which the true coefficient likely lies.
- Concordance: A measure of how well the model predicts ordering of events (closer to 1 is better).
coef exp(coef) se(coef) ...
age 0.0150 1.0151 0.0260
treatment -0.1100 0.8958 0.0300
...
Concordance = 0.80
Real-World Applications of Cox Regression
Example Scenario |
Description |
Clinical Trial Survival | Compare hazard rates of new drug vs. placebo. |
Customer Churn | Model time until churn with hazard ratios. |
Mechanical Failure | Assess how conditions affect failure time. |
Employee Turnover | Evaluate hazard of leaving given covariates. |
Cox regression is a powerful tool in regression analysis that focuses on time-to-event outcomes. It bridges statistics and practical decision-making, especially in life sciences and engineering. It provides insight into how factors impact the rate of an event occurrence over time.
17. Time Series Regression
Time Series Regression refers to regression methods specifically applied to time-indexed data where temporal order matters. In a narrow sense, it could mean using time as an explicit variable in a regression.
More broadly, it often involves using lagged values of the target or other time series as features to predict the target at future time steps. Many classical time series models (AR, ARMA, ARIMAX) can be viewed as regression models on past values.
Examples:
- Autoregressive (AR) Model: y_t = φ1 * y_(t-1) + ... + φp * y_(t-p) + ε_t. This is regression of a time series on its own past p values.
- AR With Exogenous Variables (ARX or ARIMAX): yt = φ(L) * yt + θ(L) * X_t + ε_t, effectively regression on past y’s and current/past external regressors X (like weather affecting sales).
- Distributed Lag Models: regression where current y depends on past values of some X as well.
- Trend Regression: regress y on time (and perhaps time^2, etc.) to capture trend, possibly also seasonal dummy variables.
Time series regression often requires handling autocorrelation in residuals (which violates standard regression assumptions). Techniques like adding lag terms, using ARIMA errors, or generalized least squares are used.
When to Use?
- Sequential or Temporally Ordered Data: Ideal for datasets where the order and spacing of observations hold significant information.
- Forecasting: Commonly applied to predict future values (e.g., next month’s sales, next hour’s sensor reading).
- Temporal Dynamics: Useful if past values, trends, or seasonality patterns affect current or future outcomes.
Key Characteristics of Time Series Regression
- Autocorrelation Handling: Often requires tests (Durbin-Watson) or additional AR terms to manage correlated errors.
- Specialized Alternatives: ARIMA, exponential smoothing, or neural nets can outperform simple regression in many cases.
- Temporal Features: Commonly includes day-of-week, month, holiday variables, and lagged terms to model seasonality and events.
- Forecast-Centric Evaluation: Usually validated on the chronological “tail” of data rather than random splits.
- Relation to ARIMAX: Time series regression can be viewed as ARIMAX when including moving-average error terms alongside exogenous variables.
Code Snippet
Use this approach when you want to see if there’s a trend over time in your numerical data.
- Data Setup: We create a DataFrame with columns time and sales.
- Add Trend: We add a constant term (const) to enable an intercept in our regression.
- OLS Fit: We run an ordinary least squares regression of sales on [const, time].
from statsmodels.tsa.tsatools import add_trend
import pandas as pd
from statsmodels.api import OLS
# Sample dataset
data = pd.DataFrame({
'time': [1, 2, 3, 4, 5],
'sales': [200, 220, 240, 230, 260]
})
# Add a constant (intercept)
data = add_trend(data, trend='c')
model = OLS(data['sales'], data[['const', 'time']]).fit()
print(model.summary())
Output
The output is an OLS summary table with something like this:
OLS Regression Results
==============================================================================
Dep. Variable: sales R-squared: 0.76
Model: OLS Adj. R-squared: 0.68
...
coef std err t P>|t|
-------------------------------------------------------------------------------
const 190.0000 15.811 12.025 0.001
time 10.0000 4.000 2.500 0.071
...
The slope might be around 10, indicating an average increase of 10 sales units per time step.
Real-World Applications of Time Series Regression
Example Scenario |
Description |
Economic Forecasting | Use lagged GDP and indicators. |
Energy Load Forecasting | Predict next-day demand from weather and past usage. |
Website Traffic | Forecast daily visits with seasonal patterns. |
Stock Prices | Regress on past prices and macro data. |
18. Panel Data Regression (Fixed & Random Effects)
Panel Data Regression is used when you have panel data (also called longitudinal data) – that is, multiple entities observed across time (or other contexts).
For example, test scores of multiple schools measured yearly or economic data of multiple countries over decades. Panel data regression models aim to account for both cross-sectional and time series variation, often focusing on controlling for unobserved heterogeneity across entities.
Two common approaches: Fixed Effects (FE) and Random Effects (RE) models
- Fixed Effects: This model introduces entity-specific intercepts (or coefficients) that capture each entity's time-invariant characteristics. Essentially, each individual (or group) has its own baseline.
FE uses dummy variables for entities (or equivalently, de-mean data within each entity) to control for any constant omitted factors for that entity. It focuses on within-entity variation (how changes over time in X relate to changes in Y for the same entity)
- Random Effects: This treats the individual-specific effect as a random variable drawn from a distribution. It assumes that this random effect is not correlated with the independent variables (which is a strong assumption).
The benefit is it can include time-invariant covariates (whereas FE cannot since dummies absorb those), and generally, RE is more efficient if its assumptions hold.
When to Use?
- Two-Dimensional Data: Suited for datasets with multiple entities observed across multiple time points.
- Unobserved Heterogeneity: Controls for entity-specific traits (e.g., country- or firm-specific characteristics), yielding unbiased estimates.
- Policy & Impact Analysis: Common in economics or social sciences to isolate an intervention’s effect from intrinsic differences among entities.
Key Characteristics of Panel Data Regression
- Fixed vs. Random Effects: Fixed Effects treat each entity as its own control; Random Effects pool information across entities but risk bias if the RE assumption is invalid (Hausman test can help decide).
- Two-Way Fixed Effects: Allows controlling for both entity-specific and time-specific factors (e.g., global trends).
- Improved Causal Interpretation: Controls for unobserved, constant differences among entities (e.g., city-specific pollution levels).
- Within-Entity Variation: Fixed Effects require changes over time within each entity; if a predictor never varies, its effect can’t be estimated under FE. In such cases, Random Effects may be more suitable.
Code Snippet
Use this approach for data that tracks multiple entities over time, allowing you to account for differences between entities.
- Data Setup: Each observation has an id, a year, and variables y (outcome) and x (predictor).
- Indexing: We set id and year as a multi-index, so each row corresponds to a specific entity-year pair.
- PanelOLS: Using entity_effects=True incorporates fixed effects for each entity.
import statsmodels.api as sm
from linearmodels.panel import PanelOLS
import pandas as pd
# Sample panel dataset
panel_data = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3],
'year': [2020, 2021, 2020, 2021, 2020, 2021],
'y': [3, 4, 2, 5, 1, 3],
'x': [10, 12, 8, 9, 7, 6]
}).set_index(['id', 'year'])
model = PanelOLS(panel_data['y'], sm.add_constant(panel_data['x']), entity_effects=True)
results = model.fit()
print(results.summary)
Output
When you run this, the summary helps you see how x relates to y once you control for entity-specific intercepts:
- Coefficients: The effect of each predictor (e.g., x) on the outcome.
- Entity Effects: Each entity has its own baseline.
- Within R-squared: How well the model explains variation within the same entity across time.
- Between R-squared: Variation explained across entities, averaged over time.
PanelOLS Estimation Summary
================================================================================
Dep. Variable: y R-squared: 0.50
...
Coefficients Std. Err. T-stat P-value ...
x 0.5000 0.2500 2.0000 0.1300
...
Real-World Applications of Panel Data Regression
Example Scenario |
Description |
Economics – Policy Impact | Control for state-specific and time effects. |
Education | Within-student changes to test scores over time. |
Marketing – Panel Surveys | Account for consumer-specific baselines. |
Manufacturing | Different machines tracked over time, controlling for machine-specific traits. |
What Are the Benefits of Regression Analysis?
Regression analysis in machine learning offers a range of benefits, making it an indispensable tool in data-driven decision-making and predictive modeling.
Here’s how it adds value:
Benefit |
Description |
Quantifying Relationships | Measures how independent variables impact a dependent variable. |
Prediction and Forecasting | Enables accurate predictions for continuous outcomes. |
Identifying Significant Variables | Highlights the most influential predictors among multiple variables. |
Model Evaluation | Provides tools like R-squared and error metrics to evaluate model performance. |
Control and Optimization | Optimizes processes by understanding variable interactions. |
Risk Management | Assesses potential risks by analyzing variable relationships and their uncertainty. |
Decision Support | Guides strategic choices with data-backed insights for better resource allocation and planning. |
How Can upGrad Help You?
upGrad’s data science and machine learning courses equip you with the skills to master regression analysis through the following mediums:
- Comprehensive Programs: In-depth modules on types of regression models in machine learning and their applications.
- Hands-On Learning: Real-world datasets, projects, and tools like Python and R.
- Career Support: Resume building, interview prep, and job placement assistance.
Here are some of the best AI and ML courses you can try:
- Executive Program in Generative AI for Leaders
- Master of Science in Machine Learning & AI
- Executive Diploma in Machine Learning and AI with IIIT-B
Ready to advance your career? Get personalized counseling from upGrad’s experts to help you choose the right program for your goals. You can also visit your nearest upGrad Career Center to kickstart your future!
Related Blogs:
Expand your expertise with the best resources available. Browse the programs below to find your ideal fit in Best Machine Learning and AI Courses Online.
Best Machine Learning and AI Courses Online
Discover in-demand Machine Learning skills to expand your expertise. Explore the programs below to find the perfect fit for your goals.
In-demand Machine Learning Skills
Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.
Popular AI and ML Blogs & Free Courses
Frequently Asked Questions (FAQs)
1. What are the three types of multiple regression?
2. What is the difference between linear regression and logistic regression?
3. What are the different types of stepwise regression?
4. Which is better, Lasso or Ridge?
5. What is multicollinearity in regression?
6. What are the 2 main types of regression?
7. What is multivariate regression in machine learning?
8. What are the different types of multivariate regression?
9. What is ordinal regression in ML?
10. When to use logistic regression?
11. What is autocorrelation in regression?
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Top Resources