View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Isotonic Regression in Machine Learning: A Comprehensive Guide with Python Implementation

By Pavan Vadapalli

Updated on Feb 19, 2025 | 10 min read | 7.6k views

Share:

Regression analysis is a key principle in the domain of machine learning. It belongs to supervised learning, where the algorithm is instructed using both input characteristics and output labels. 

Regression techniques are statistical methods aimed at exploring the relationship between a dependent variable and one or more independent variables, enabling researchers to foresee how alterations in the independent variables will influence the dependent variable, effectively discovering a "best fit" line among data points to comprehend the connection between variables. 

Among the regression methods, the progression of the isotonic regression in machine learning mainly focused on creating efficient algorithms, such as the Pool Adjacent Violators Algorithm (PAVA), to accurately fit non-decreasing functions to datasets while upholding the essential idea of preserving the sequence of the input variables, rendering it a useful instrument in scenarios where a monotonic relationship is anticipated between features and target values. 

What is Isotonic Regression?

Isotonic regression is an effective method when dealing with non-linear data that violates the assumptions of a linear regression model. Isotonic regression offers a piecewise linear approximation that captures the underlying trend in the data by fitting a non-decreasing function to the dataset. It is considered as an important topic in the machine learning courses

In this article, we demonstrate how to employ scikit-learn's IsotonicRegression class to create an isotonic regression model, generate predictions, and visualize the outcomes. 

Also Read: What is the Difference Between Correlation and Regression?

Comparison With Linear Regression And Other Regression Techniques

Aspects

Isotonic Regression

Linear Regression

Polynomial Regression

Logistic Regression

Decision Trees of Regression

Nature of Relationship

always monotonic

a straight-line 

Non-Linear

a probabilistic relationship

Non-Linear

Function Assumption

the relationship between the independent and dependent variables is monotonic

there exists a linear relationship between the dependent variable (outcome) and the independent variables (predictors)

the relationship between the dependent variable and independent variable can be described by a polynomial function.

the relationship between the independent variables and the log odds of the dependent variable is linear

the data can be effectively split into subsets based on the input features

Flexibility

Flexible

relatively low flexibility

high degree of flexibility

Flexible

highly flexible

Interpretability

highly interpretable

intrinsically interpretable

more challenging

highly interpretable

highly interpretable

Overfitting Risk

high risk of overfitting

prone to overfitting

high risk of overfitting

prone to overfitting

high risk of overfitting

Computational Complexity

O(n 2)

O(n2m + n3)

O(n^2) or O(n^3)

O(nd)*

O(mn log n)

Common Applications

probability calibration in machine learning models, ranking systems, data visualization

Market analysis.

Financial analysis.

Sports analysis. 

Environmental health.

Medicine.

Least squares.

Gradient descent.

Predicting outcomes.

modeling stock trends, analyzing system performance curves, predicting growth patterns

Fraud Detection, Disease Spread Prediction, Illness mortality prediction

sales forecasting, predicting property prices, estimating customer lifetime value, analyzing financial performance

Also Read: Machine Learning Models Explained

Placement Assistance

Executive PG Program13 Months
View Program
background

Liverpool John Moores University

Master of Science in Machine Learning & AI

Dual Credentials

Master's Degree19 Months
View Program

Mathematical Foundation of Isotonic Regression

In many issues related to statistical inference with order constraints, there exists a fundamental mathematical idea called isotonic regression. 

Formulation

Objective function

Minimize the weighted sum of squared errors:

  • min ∑(w_i * (y_i - y_hat_i)^2)

Let's assume that the variables x and y are read into IML vectors and that neither have missing values. Let n be the size of the vectors.

Whether you use the older NLPQUA subroutine or the newer QPSOLVE subroutine, you must specify the matrix Q and the vector v for the QP. Because Q=2*I is a diagonal matrix, you could express Q as a dense matrix: Q=2*I(nrow(y)). But this is a waste of memory. Both NLPQUA and QPSOLVE enable you to use a three-column sparse representation for Q. The first column is the set of row indices (1:n), the second is the set of column indices (also 1:n), and the third column is the set of values (here, all 2s).

Although the constant term is not necessary for the optimization, you can optionally append it to the v vector.

Constraint

  • y_hat_i <= y_hat_j whenever x_i <= x_j

The specification of constraints for the NLPQUA subroutine is a little complicated. The specification was greatly simplified in the QPSOLVE subroutine. They each use the same building blocks, so I'll show the harder of the two cases, which is the specification for NLPQUA.

The (n-1) x n matrix, A, is defined as a bi-diagonal matrix. The (n+1)th column encodes the inequality operation. The (n+2)th column contains the right-hand side of the constraint system, which is 0 in this problem.

For the NLPQUA subroutine, the linear constraint matrix must be appended to a representation for the boundary constraints. There are no boundary constraints for this problem, so you can use a matrix of missing values:

/* specify the parameter bounds, if any */

bounds = j(2, n+2, .);

/* for NLPQUA, the bounds and linear constraint matrix are 

   assembled into one argument. In QPSOLVE, they are separate */

con  = bounds // lincon;  /* for NLPQUA */

The remaining step is to specify an initial guess for the vector of parameters. The following statement sets all parameters to the mean of the response values. The information is then passed to the NLPQUA routine, which solves for the c vector:

p0 = j(1, n, mean(y));   /* initial guess is c[i]=mean(y) */

opt = {0 0};              /* minimize quadratic; no print */

call nlpqua(rc, c, sparseQ, p0, opt) BLC=con lin=linvec;

pred = colvec(c);

Pool Adjacent Violators Algorithm (PAVA)

The PAVA method performs exactly as its name indicates. It evaluates the points, and when it discovers a point that breaches the constraints, it merges that value with its neighboring members, which eventually come together to create a block. 

Specifically, PAVA performs the following tasks, 

  • [Initialization step] Assign v and κ values that fulfill conditions 1, 3, and 4. 
  • [Termination Step] If condition 2 (primal condition) is met, then conclude. 
  • [Pool Adjacent Violators] Select any j for which νj is greater than νj+1. And “pool” the blocks containing j and j + 1, meaning to create a single block (and the nu or µ values for this pool block will again be the weighted average of the values for this pooled block). [This phase upholds conditions 1, 3, and 4.]
  • Go to Step 2 again.

Having a fundamental understanding of how this algorithm operates, let's put it into practice using Python

Initially, we must import the necessary libraries: 

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.isotonic import IsotonicRegression

from matplotlib.collections import LineCollection

import random

Then let’s generate some points and add some random noise to them:

Python Code:

import numpy as np

import matplotlib.pyplot as plt

n=100 #no. of points

x=np.arange(n)

y= 50*np.log(1+x) + np.random.randint(-50,50,size=(n,))

plt.scatter(x, y)

plt.show()

Applications of Isotonic Regression in Machine Learning

  • Probability Calibration

Isotonic regression is very useful when dealing with several input variables. We can examine every single dimension as well as every single function and interpolate it linearly. This facilitates simple multidimensional scaling. 

  • Dose-Response Analysis 

Dose-response modeling is a technique employed to quantitatively evaluate the connection between exposure to a chemical (or other stressor) and its associated effects. In cases where a medical characteristic (such as dosage) is anticipated to uniformly affect an outcome (such as patient recovery), isotonic regression is applicable for modeling this association. 

  • Non-Metric Multidimensional Scaling 

In logistic regression, let’s consider a variable x, and we represent a probability p(1) where the probability value for this variable remains unchanged. However, in actuality, the likelihood value is greater in the real world. In these situations, isotonic regression in machine learning is extremely useful for calibration or enhancing the likelihood of such variables.

Advantages & Limitations of Isotonic Regression

Advantages

  • No constraint of linearity

Isotonic regression does not necessitate a linear relationship between variables, enabling it to identify non-linear monotonic patterns in data that a linear model would fail to adequately represent. 

  • Maintains the same sequence

The model guarantees that the forecasted values preserve the same relative sequence as the original data, which is essential in cases where the arrangement of data points matters. 

  • Simple to understand

The result of isotonic regression in machine learning is a constant function that varies in pieces, making it easy to comprehend and visualize. 

  • Non-parametric

Being a non-parametric technique, isotonic regression does not necessitate assumptions regarding the underlying data distribution, allowing it to be flexible in various situations. 

  • Calibration in machine learning

Isotonic regression is frequently employed to adjust the predicted probabilities of machine learning models, making sure that the predicted probabilities correspond with the real class probabilities. 

Limitations

  • Overfitting 

When data contains significant noise, isotonic regression may fit excessively by generating numerous small segments, resulting in weak performance on new data. 

  • Underfitting 

If the true relationship among variables isn't genuinely monotonic, isotonic regression may underfit by failing to reflect the real trend, resulting in a subpar fit to the data. 

  • Challenges with Scalability

Isotonic regression can become computationally costly and slow for large datasets because of the necessity to sort the data. 

  • Restricted Adaptability

Isotonic regression is limited to modeling monotonic relationships (either rising or falling), thus it is ineffective for scenarios where the relationship is intricate or non-monotonic. 

  • Responsiveness to Outliers

Similar to various regression methods, isotonic regression may be influenced by outliers, which can considerably impact the resulting fitted line. 

Implementation of Isotonic Regression in Python using Scikit-learn

Step 1: Import the necessary libraries

Let's start by importing the IsotonicRegression class from the sklearn.isotonic module, which is part of the Scikit-learn library.

from sklearn.isotonic import IsotonicRegression

Step 2: Create sample data

Next, we need to create some sample data to fit our isotonic regression model. In this example, we will generate two arrays, X and y, representing the input data and the target values, respectively.

Step 3: Import numpy as np

## Generate random input data

np.random.seed(0)

X = np.random.rand(100)

y = 4 * X + np.random.randn(100)

Step 4: Fit the isotonic regression model

Now, we can fit the isotonic regression model to our data. We create an instance of the IsotonicRegression class and call the fit method with our input data and target values.

## Fit isotonic regression model

ir = IsotonicRegression()

ir.fit(X, y)

Step 5: Predict using the model

After fitting the model, we can use it to make predictions on new data. Let's create a new array X_new and predict the corresponding target values.

## Create new data for prediction

X_new = np.linspace(0, 1, 100)

y_pred = ir.predict(X_new)

Step 6: Visualize the results

Finally, let's visualize the results of our isotonic regression model. We can plot the original data points as scatter points and the predicted values as a line.

import matplotlib.pyplot as plt

## Plot the original data and predicted values

plt.scatter(X, y, c='b', label='Original Data')

plt.plot(X_new, y_pred, c='r', label='Isotonic Regression')

plt.xlabel('X')

plt.ylabel('y')

plt.legend()

plt.show()

Also Read: Python In-Built Function [With Syntax and Examples]

How upGrad will help You

If you want to attain further insights into isotonic regression in machine learning or other concepts related to machine learning, consider exploring IIIT-B and upGrad’s PG Diploma in Machine Learning and AI, which is India’s top-selling program featuring a 4.5-star rating. The program features over 450 hours of instruction, more than 30 case studies, and assignments, assisting students in acquiring top quality skills in machine learning and AI. 

Other relevant courses include:

For more information and career guidance reach out to our experts.

Expand your expertise with the best resources available. Browse the programs below to find your ideal fit in Best Machine Learning and AI Courses Online.

Discover in-demand Machine Learning skills to expand your expertise. Explore the programs below to find the perfect fit for your goals.

Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.

Frequently Asked Questions

1. How does isotonic regression differ from linear regression?

2. In which scenarios is isotonic regression used?

3. How is isotonic regression implemented in Python?

4. How does isotonic regression handle non-linear data?

5. Can isotonic regression be used for probability calibration?

6. How does isotonic regression ensure monotonicity in predictions?

7. Is isotonic regression suitable for high-dimensional data?

8. Can isotonic regression be applied to time-series data?

9. How does isotonic regression handle missing data?

10. How does isotonic regression relate to convex optimization?

11. Can isotonic regression be combined with other machine-learning models?

Reference Links:
https://labex.io/tutorials/ml-nonlinear-regression-with-isotonic-71112
https://machinelearningmastery.com/7-scikit-learn-secrets-you-probably-didnt-know-about/
https://www.imurgence.com/home/blog/how-to-improve-prediction-isotonic-regression-over-linear-regression
https://www.analyticsvidhya.com/blog/2021/02/isotonic-regression-and-the-pava-algorithm/
https://www.upgrad.com/blog/isotonic-regression-in-machine-learning/
https://blogs.sas.com/content/iml/2024/07/15/isotonic-regression.html
https://josephsalmon.eu/blog/isotonic/#introduction-formulation
https://www.smarten.com/blog/isotonic-regression-analyze-data/

Pavan Vadapalli

899 articles published

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

India’s #1 Tech University

Executive Program in Generative AI for Leaders

76%

seats filled

View Program

Top Resources

Recommended Programs

LJMU

Liverpool John Moores University

Master of Science in Machine Learning & AI

Dual Credentials

Master's Degree

19 Months

View Program
IIITB
bestseller

IIIT Bangalore

Executive Diploma in Machine Learning and AI

Placement Assistance

Executive PG Program

13 Months

View Program
IIITB

IIIT Bangalore

Post Graduate Certificate in Machine Learning & NLP (Executive)

Career Essentials Soft Skills Program

Certification

8 Months

View Program