- Blog Categories
- Software Development Projects and Ideas
- 12 Computer Science Project Ideas
- 28 Beginner Software Projects
- Top 10 Engineering Project Ideas
- Top 10 Easy Final Year Projects
- Top 10 Mini Projects for Engineers
- 25 Best Django Project Ideas
- Top 20 MERN Stack Project Ideas
- Top 12 Real Time Projects
- Top 6 Major CSE Projects
- 12 Robotics Projects for All Levels
- Java Programming Concepts
- Abstract Class in Java and Methods
- Constructor Overloading in Java
- StringBuffer vs StringBuilder
- Java Identifiers: Syntax & Examples
- Types of Variables in Java Explained
- Composition in Java: Examples
- Append in Java: Implementation
- Loose Coupling vs Tight Coupling
- Integrity Constraints in DBMS
- Different Types of Operators Explained
- Career and Interview Preparation in IT
- Top 14 IT Courses for Jobs
- Top 20 Highest Paying Languages
- 23 Top CS Interview Q&A
- Best IT Jobs without Coding
- Software Engineer Salary in India
- 44 Agile Methodology Interview Q&A
- 10 Software Engineering Challenges
- Top 15 Tech's Daily Life Impact
- 10 Best Backends for React
- Cloud Computing Reference Models
- Web Development and Security
- Find Installed NPM Version
- Install Specific NPM Package Version
- Make API Calls in Angular
- Install Bootstrap in Angular
- Use Axios in React: Guide
- StrictMode in React: Usage
- 75 Cyber Security Research Topics
- Top 7 Languages for Ethical Hacking
- Top 20 Docker Commands
- Advantages of OOP
- Data Science Projects and Applications
- 42 Python Project Ideas for Beginners
- 13 Data Science Project Ideas
- 13 Data Structure Project Ideas
- 12 Real-World Python Applications
- Python Banking Project
- Data Science Course Eligibility
- Association Rule Mining Overview
- Cluster Analysis in Data Mining
- Classification in Data Mining
- KDD Process in Data Mining
- Data Structures and Algorithms
- Binary Tree Types Explained
- Binary Search Algorithm
- Sorting in Data Structure
- Binary Tree in Data Structure
- Binary Tree vs Binary Search Tree
- Recursion in Data Structure
- Data Structure Search Methods: Explained
- Binary Tree Interview Q&A
- Linear vs Binary Search
- Priority Queue Overview
- Python Programming and Tools
- Top 30 Python Pattern Programs
- List vs Tuple
- Python Free Online Course
- Method Overriding in Python
- Top 21 Python Developer Skills
- Reverse a Number in Python
- Switch Case Functions in Python
- Info Retrieval System Overview
- Reverse a Number in Python
- Real-World Python Applications
- Data Science Careers and Comparisons
- Data Analyst Salary in India
- Data Scientist Salary in India
- Free Excel Certification Course
- Actuary Salary in India
- Data Analyst Interview Guide
- Pandas Interview Guide
- Tableau Filters Explained
- Data Mining Techniques Overview
- Data Analytics Lifecycle Phases
- Data Science Vs Analytics Comparison
- Artificial Intelligence and Machine Learning Projects
- Exciting IoT Project Ideas
- 16 Exciting AI Project Ideas
- 45+ Interesting ML Project Ideas
- Exciting Deep Learning Projects
- 12 Intriguing Linear Regression Projects
- 13 Neural Network Projects
- 5 Exciting Image Processing Projects
- Top 8 Thrilling AWS Projects
- 12 Engaging AI Projects in Python
- NLP Projects for Beginners
- Concepts and Algorithms in AIML
- Basic CNN Architecture Explained
- 6 Types of Regression Models
- Data Preprocessing Steps
- Bagging vs Boosting in ML
- Multinomial Naive Bayes Overview
- Gini Index for Decision Trees
- Bayesian Network Example
- Bayes Theorem Guide
- Top 10 Dimensionality Reduction Techniques
- Neural Network Step-by-Step Guide
- Technical Guides and Comparisons
- Make a Chatbot in Python
- Compute Square Roots in Python
- Permutation vs Combination
- Image Segmentation Techniques
- Generative AI vs Traditional AI
- AI vs Human Intelligence
- Random Forest vs Decision Tree
- Neural Network Overview
- Perceptron Learning Algorithm
- Selection Sort Algorithm
- Career and Practical Applications in AIML
- AI Salary in India Overview
- Biological Neural Network Basics
- Top 10 AI Challenges
- Production System in AI
- Top 8 Raspberry Pi Alternatives
- Top 8 Open Source Projects
- 14 Raspberry Pi Project Ideas
- 15 MATLAB Project Ideas
- Top 10 Python NLP Libraries
- Naive Bayes Explained
- Digital Marketing Projects and Strategies
- 10 Best Digital Marketing Projects
- 17 Fun Social Media Projects
- Top 6 SEO Project Ideas
- Digital Marketing Case Studies
- Coca-Cola Marketing Strategy
- Nestle Marketing Strategy Analysis
- Zomato Marketing Strategy
- Monetize Instagram Guide
- Become a Successful Instagram Influencer
- 8 Best Lead Generation Techniques
- Digital Marketing Careers and Salaries
- Digital Marketing Salary in India
- Top 10 Highest Paying Marketing Jobs
- Highest Paying Digital Marketing Jobs
- SEO Salary in India
- Brand Manager Salary in India
- Content Writer Salary Guide
- Digital Marketing Executive Roles
- Career in Digital Marketing Guide
- Future of Digital Marketing
- MBA in Digital Marketing Overview
- Digital Marketing Techniques and Channels
- 9 Types of Digital Marketing Channels
- Top 10 Benefits of Marketing Branding
- 100 Best YouTube Channel Ideas
- YouTube Earnings in India
- 7 Reasons to Study Digital Marketing
- Top 10 Digital Marketing Objectives
- 10 Best Digital Marketing Blogs
- Top 5 Industries Using Digital Marketing
- Growth of Digital Marketing in India
- Top Career Options in Marketing
- Interview Preparation and Skills
- 73 Google Analytics Interview Q&A
- 56 Social Media Marketing Q&A
- 78 Google AdWords Interview Q&A
- Top 133 SEO Interview Q&A
- 27+ Digital Marketing Q&A
- Digital Marketing Free Course
- Top 9 Skills for PPC Analysts
- Movies with Successful Social Media Campaigns
- Marketing Communication Steps
- Top 10 Reasons to Be an Affiliate Marketer
- Career Options and Paths
- Top 25 Highest Paying Jobs India
- Top 25 Highest Paying Jobs World
- Top 10 Highest Paid Commerce Job
- Career Options After 12th Arts
- Top 7 Commerce Courses Without Maths
- Top 7 Career Options After PCB
- Best Career Options for Commerce
- Career Options After 12th CS
- Top 10 Career Options After 10th
- 8 Best Career Options After BA
- Projects and Academic Pursuits
- 17 Exciting Final Year Projects
- Top 12 Commerce Project Topics
- Top 13 BCA Project Ideas
- Career Options After 12th Science
- Top 15 CS Jobs in India
- 12 Best Career Options After M.Com
- 9 Best Career Options After B.Sc
- 7 Best Career Options After BCA
- 22 Best Career Options After MCA
- 16 Top Career Options After CE
- Courses and Certifications
- 10 Best Job-Oriented Courses
- Best Online Computer Courses
- Top 15 Trending Online Courses
- Top 19 High Salary Certificate Courses
- 21 Best Programming Courses for Jobs
- What is SGPA? Convert to CGPA
- GPA to Percentage Calculator
- Highest Salary Engineering Stream
- 15 Top Career Options After Engineering
- 6 Top Career Options After BBA
- Job Market and Interview Preparation
- Why Should You Be Hired: 5 Answers
- Top 10 Future Career Options
- Top 15 Highest Paid IT Jobs India
- 5 Common Guesstimate Interview Q&A
- Average CEO Salary: Top Paid CEOs
- Career Options in Political Science
- Top 15 Highest Paying Non-IT Jobs
- Cover Letter Examples for Jobs
- Top 5 Highest Paying Freelance Jobs
- Top 10 Highest Paying Companies India
- Career Options and Paths After MBA
- 20 Best Careers After B.Com
- Career Options After MBA Marketing
- Top 14 Careers After MBA In HR
- Top 10 Highest Paying HR Jobs India
- How to Become an Investment Banker
- Career Options After MBA - High Paying
- Scope of MBA in Operations Management
- Best MBA for Working Professionals India
- MBA After BA - Is It Right For You?
- Best Online MBA Courses India
- MBA Project Ideas and Topics
- 11 Exciting MBA HR Project Ideas
- Top 15 MBA Project Ideas
- 18 Exciting MBA Marketing Projects
- MBA Project Ideas: Consumer Behavior
- What is Brand Management?
- What is Holistic Marketing?
- What is Green Marketing?
- Intro to Organizational Behavior Model
- Tech Skills Every MBA Should Learn
- Most Demanding Short Term Courses MBA
- MBA Salary, Resume, and Skills
- MBA Salary in India
- HR Salary in India
- Investment Banker Salary India
- MBA Resume Samples
- Sample SOP for MBA
- Sample SOP for Internship
- 7 Ways MBA Helps Your Career
- Must-have Skills in Sales Career
- 8 Skills MBA Helps You Improve
- Top 20+ SAP FICO Interview Q&A
- MBA Specializations and Comparative Guides
- Why MBA After B.Tech? 5 Reasons
- How to Answer 'Why MBA After Engineering?'
- Why MBA in Finance
- MBA After BSc: 10 Reasons
- Which MBA Specialization to choose?
- Top 10 MBA Specializations
- MBA vs Masters: Which to Choose?
- Benefits of MBA After CA
- 5 Steps to Management Consultant
- 37 Must-Read HR Interview Q&A
- Fundamentals and Theories of Management
- What is Management? Objectives & Functions
- Nature and Scope of Management
- Decision Making in Management
- Management Process: Definition & Functions
- Importance of Management
- What are Motivation Theories?
- Tools of Financial Statement Analysis
- Negotiation Skills: Definition & Benefits
- Career Development in HRM
- Top 20 Must-Have HRM Policies
- Project and Supply Chain Management
- Top 20 Project Management Case Studies
- 10 Innovative Supply Chain Projects
- Latest Management Project Topics
- 10 Project Management Project Ideas
- 6 Types of Supply Chain Models
- Top 10 Advantages of SCM
- Top 10 Supply Chain Books
- What is Project Description?
- Top 10 Project Management Companies
- Best Project Management Courses Online
- Salaries and Career Paths in Management
- Project Manager Salary in India
- Average Product Manager Salary India
- Supply Chain Management Salary India
- Salary After BBA in India
- PGDM Salary in India
- Top 7 Career Options in Management
- CSPO Certification Cost
- Why Choose Product Management?
- Product Management in Pharma
- Product Design in Operations Management
- Industry-Specific Management and Case Studies
- Amazon Business Case Study
- Service Delivery Manager Job
- Product Management Examples
- Product Management in Automobiles
- Product Management in Banking
- Sample SOP for Business Management
- Video Game Design Components
- Top 5 Business Courses India
- Free Management Online Course
- SCM Interview Q&A
- Fundamentals and Types of Law
- Acceptance in Contract Law
- Offer in Contract Law
- 9 Types of Evidence
- Types of Law in India
- Introduction to Contract Law
- Negotiable Instrument Act
- Corporate Tax Basics
- Intellectual Property Law
- Workmen Compensation Explained
- Lawyer vs Advocate Difference
- Law Education and Courses
- LLM Subjects & Syllabus
- Corporate Law Subjects
- LLM Course Duration
- Top 10 Online LLM Courses
- Online LLM Degree
- Step-by-Step Guide to Studying Law
- Top 5 Law Books to Read
- Why Legal Studies?
- Pursuing a Career in Law
- How to Become Lawyer in India
- Career Options and Salaries in Law
- Career Options in Law India
- Corporate Lawyer Salary India
- How To Become a Corporate Lawyer
- Career in Law: Starting, Salary
- Career Opportunities: Corporate Law
- Business Lawyer: Role & Salary Info
- Average Lawyer Salary India
- Top Career Options for Lawyers
- Types of Lawyers in India
- Steps to Become SC Lawyer in India
- Tutorials
- C Tutorials
- Recursion in C: Fibonacci Series
- Checking String Palindromes in C
- Prime Number Program in C
- Implementing Square Root in C
- Matrix Multiplication in C
- Understanding Double Data Type
- Factorial of a Number in C
- Structure of a C Program
- Building a Calculator Program in C
- Compiling C Programs on Linux
- Java Tutorials
- Handling String Input in Java
- Determining Even and Odd Numbers
- Prime Number Checker
- Sorting a String
- User-Defined Exceptions
- Understanding the Thread Life Cycle
- Swapping Two Numbers
- Using Final Classes
- Area of a Triangle
- Skills
- Software Engineering
- JavaScript
- Data Structure
- React.js
- Core Java
- Node.js
- Blockchain
- SQL
- Full stack development
- Devops
- NFT
- BigData
- Cyber Security
- Cloud Computing
- Database Design with MySQL
- Cryptocurrency
- Python
- Digital Marketings
- Advertising
- Influencer Marketing
- Search Engine Optimization
- Performance Marketing
- Search Engine Marketing
- Email Marketing
- Content Marketing
- Social Media Marketing
- Display Advertising
- Marketing Analytics
- Web Analytics
- Affiliate Marketing
- MBA
- MBA in Finance
- MBA in HR
- MBA in Marketing
- MBA in Business Analytics
- MBA in Operations Management
- MBA in International Business
- MBA in Information Technology
- MBA in Healthcare Management
- MBA In General Management
- MBA in Agriculture
- MBA in Supply Chain Management
- MBA in Entrepreneurship
- MBA in Project Management
- Management Program
- Consumer Behaviour
- Supply Chain Management
- Financial Analytics
- Introduction to Fintech
- Introduction to HR Analytics
- Fundamentals of Communication
- Art of Effective Communication
- Introduction to Research Methodology
- Mastering Sales Technique
- Business Communication
- Fundamentals of Journalism
- Economics Masterclass
- Free Courses
Logistic Regression for Machine Learning: A Complete Guide
Updated on 25 October, 2024
9.38K+ views
• 14 min read
Table of Contents
The demand for Data Science skills in India is increasing as we speak. The ever-changing workforce has made Machine Learning and Data Science the front runner for the past few years as skills like Logistic regression, Linear regression, and Gradient descent are some of the key concepts that you apply in machine learning with python projects to learn as a Data Scientist.
Every machine learning algorithm performs best under a given set of conditions. To ensure good performance, we must know which algorithm to use depending on the problem at hand. You cannot just use one particular algorithm for all problems. For example, a Linear regression algorithm cannot be applied to a categorical dependent variable. This is where Logistic Regression comes in.
Let’s dive right in and learn what Logistic Regression is and how we can optimize it in our business.
Logistic Regression is a popular statistical model used for binary classification, that is for predictions of the type this or that, yes or no, A or B, etc. Logistic regression can, however, be used for multiclass classification, but here we will focus on its simplest application. It is one of the most frequently used machine learning algorithms for binary classifications that translates the input to 0 or 1. For example,
- 0: negative class
- 1: positive class
Some examples of classification are mentioned below:
- Email: spam / not spam
- Online transactions: fraudulent / not fraudulent
- Tumor: malignant / not malignant
Let us look at the issues we encounter in Linear Regression.
Issue 1 of Linear Regression
As you can see on the graph mentioned below, the prediction would leave out malignant tumors as the gradient becomes less steep with an additional data point on the extreme right.
Logistic Regression is one of the tools that help in the development of Machine Learning models and algorithms. Likewise, there are multiple other algorithms, too, that are used depending on the use case at hand. However, to know which algorithm to use, you should be aware of all possible options. Only then will you be in a position to select the most fitting algorithm for your data set.
Check out our Executive PG Program in Machine Learning designed in a way that takes you from scratch and helps you build your skills to the very top – so that you are in a position to solve any real-world Machine Learning problem. Check out the different courses and enroll in the one that feels right for you. Join upGrad and experience a holistic learning environment and placement support!
Issue 2 of Linear Regression
- Hypothesis can be larger than 1 or smaller than zero
- Hence, we have to use logistic regression
What is Logistic Regression?
Logistic Regression is the appropriate regression analysis to conduct when the dependent variable has a binary solution. Similar to all other types of regression systems, Logistic Regression is also a type of predictive regression system. Logistic regression is used to evaluate the relationship between one dependent binary variable and one or more independent variables. It gives discrete outputs ranging between 0 and 1.
A simple example of Logistic Regression is: Does calorie intake, weather, and age have any influence on the risk of having a heart attack? The question can have a discrete answer, either “yes” or “no”.
Logistic Regression Hypothesis
The logistic regression classifier can be derived by analogy to the linear regression hypothesis which is:
However, the logistic regression hypothesis generalizes from the linear regression hypothesis in that it uses the logistic function:
The result is the logistic regression hypothesis:
The function g(z) is the logistic function, also known as the sigmoid function.
The logistic function has asymptotes at 0 and 1, and it crosses the y-axis at 0.5.
How does Logistic Regression work?
Logistic Regression uses a more complex cost function than Linear Regression, this cost function is called the ‘Sigmoid function’ or also known as the ‘logistic function’ instead of a linear function.
The hypothesis of logistic regression tends to limit the cost function between 0 and 1. Therefore linear functions fail to represent it as it can have a value greater than 1 or less than 0 which is not possible as per the hypothesis of logistic regression.
Sigmoid function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities.
Formula:
Where,
f(x) = output between 0 and 1 (probability estimate)
x = input to the function
e = base of natural log
Decision Boundary
The prediction function returns a probability score between 0 and 1. If you want to map the discrete class (true/false, yes/no), you will have to select a threshold value above which you will be classifying values into class 1 and below the threshold value into class 2.
p≥0.5,class=1
p<0.5,class=0
For example, suppose the threshold value is 0.5 and your prediction function returns 0.7, it will be classified as positive. If your predicted value is 0.2, which is less than the threshold value, it will be classified as negative. For logistic regression with multiple classes we could select the class with the highest predicted probability.
Our aim should be to maximize the likelihood that a random data point gets classified correctly, which is called Maximum Likelihood Estimation. Maximum Likelihood Estimation is a general approach to estimating parameters in statistical models. The likelihood can be maximized using an optimization algorithm. Newton’s Method is one such algorithm which can be used to find maximum (or minimum) of many different functions, including the likelihood function. Other than Newton’s Method, you can also use Gradient Descent.
Cost Function
We have covered Cost Function earlier in the blog on Linear Regression. In brief, a cost function is created for optimization purpose so that we can minimize it and create a model with minimum error.
Cost function for Logistic Regression are:
Cost(hθ(x),y) = −log(hθ(x)) if y = 1
Cost(hθ(x),y) = −log(1−hθ(x)) if y = 0
The above functions can be written together as:
Gradient Descent
After finding out the cost function for Logistic Regression, our job should be to minimize it i.e. min J(θ). The cost function can be reduced by using Gradient Descent.
The general form of gradient descent:
The derivative part can be solved using calculus so the equation comes to:
When to use Logistic Regression?
Logistic Regression is used when the input needs to be separated into “two regions” by a linear boundary. The data points are separated using a linear line as shown:
Based on the number of categories, Logistic regression can be classified as:
- binomial: target variable can have only 2 possible types: “0” or “1” which may represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
- multinomial: target variable can have 3 or more possible types which are not ordered(i.e. types have no quantitative significance) like “disease A” vs “disease B” vs “disease C”.
- ordinal: it deals with target variables with ordered categories. For example, a test score can be categorized as:“very poor”, “poor”, “good”, “very good”. Here, each category can be given a score like 0, 1, 2, 3.
Let us explore the simplest form of Logistic Regression, i.e Binomial Logistic Regression. It can be used while solving a classification problem, i.e. when the y-variable takes on only two values. Such a variable is said to be a “binary” or “dichotomous” variable. “Dichotomous” basically means two categories such as yes/no, defective/non-defective, success/failure, and so on. “Binary” refers to the 0's and 1’s.
The correct usage of different logistic regression is a must to solve data-related problems and the statistical skills you learn while getting a data science certification come in handy to narrow down the choices.
Linear vs Logistic Regression
Linear Regression | Logistic Regression | |
---|---|---|
Outcome | In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. | In logistic regression, the outcome (dependent variable) has only a limited number of possible values. |
The dependent variable | Linear regression is used when your response variable is continuous. For instance, weight, height, number of hours, etc. | Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc. |
The independent variable | In Linear Regression, the independent variables can be correlated with each other. | In logistic Regression, the independent variables should not be correlated with each other. (no multi-collinearity) |
Equation | Linear regression gives an equation which is of the form Y = mX + C, means equation with degree 1. | Logistic regression gives an equation which is of the form Y = eX + e-X. |
Coefficient interpretation | In linear regression, the coefficient interpretation of independent variables are quite straightforward (i.e. holding all other variables constant, with a unit increase in this variable, the dependent variable is expected to increase/decrease by xxx). | In logistic regression, depends on the family (binomial, Poisson, etc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different. |
Error minimization technique | Linear regression uses ordinary least squares method to minimise the errors and arrive at a best possible fit, while logistic regression uses maximum likelihood method to arrive at the solution. | Logistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotic constant. |
How is OLS different from MLE?
Linear regression is estimated using Ordinary Least Squares (OLS) while logistic regression is estimated using Maximum Likelihood Estimation (MLE) approach.
Ordinary Least Squares (OLS) also called the linear least squares is a method to approximately determine the unknown parameters of a linear regression model. Ordinary least squares is obtained by minimizing the total squared vertical distances between the observed responses within the dataset and the responses predicted by the linear approximation(represented by the line of best fit or regression line). The resulting estimator can be represented using a simple formula.
For example, let’s say you have a set of equations which consist of several equations with unknown parameters. The ordinary least squares method may be used because this is the most standard approach in finding the approximate solution to your overly determined systems. In other words, it is your overall solution in minimizing the sum of the squares of errors in your equation. Data that best fits the ordinary least squares minimizes the sum of squared residuals. Residual is the difference between an observed value and the predicted value provided by a model.
Maximum likelihood estimation, or MLE, is a method used in estimating the parameters of a statistical model, and for fitting a statistical model to data. If you want to find the height measurement of every basketball player in a specific location, maximum likelihood estimation can be used. If you could not afford to measure all of the basketball players’ heights, the maximum likelihood estimation can come in very handy. Using the maximum likelihood estimation, you can estimate the mean and variance of the height of your subjects. The MLE would set the mean and variance as parameters in determining the specific parametric values in a given model.
To sum it up, the maximum likelihood estimation covers a set of parameters which can be used for predicting the data needed in a normal distribution. A given, fixed set of data and its probability model would likely produce the predicted data. The MLE would give us a unified approach when it comes to the estimation. But in some cases, we cannot use the maximum likelihood estimation because of recognized errors or the problem actually doesn’t even exist in reality.
Building Logistic Regression Model
To build a logistic regression model we can use statsmodel and the inbuilt logistic regression function present in the sklearn library.
# Importing Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Reading German Credit Data
raw_data = pd.read_csv("/content/German_Credit_data.csv")
raw_data.head()
Building Logistic Regression Base Model after data preparation:
import statsmodels.api as sm
#Build Logit Model
logit = sm.Logit(y_train,x_train)
# fit the model
model1 = logit.fit()
# Printing Logistic Regression model results
model1.summary2()
Optimization terminated successfully.
Current function value: 0.480402
Iterations 6
Model: Logit Pseudo R-squared: 0.197
Dependent Variable: Creditability AIC: 712.5629
Date: 2019-09-19 09:55 BIC: 803.5845
No. Observations: 700 Log-Likelihood: -336.28
Df Model: 19 LL-Null: -418.79
Df Residuals: 680 LLR p-value: 2.6772e-25
Converged: 1.0000 Scale: 1.0000
No. Iterations: 6.0000
We will calculate the model accuracy on the test dataset using ‘score’ function.
# Checking the accuracy with test data
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,predicted_df['Predicted_Class']))
0.74
We can see the accuracy of 74%.
Model Evaluation
Model evaluation metrics are used to find out the goodness of the fit between model and data, to compare the different models, in the context of model selection, and to predict how predictions are expected to be accurate.
What is a Confusion Matrix?
A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix. The confusion matrix shows the ways in which your classification model is confused when it makes predictions.
Confusion Matrix gives insight not only into the errors being made by your classifier but more importantly the types of errors that are being made. It is this breakdown that overcomes the limitation of using classification accuracy alone.
How to Calculate a Confusion Matrix
Below is the process for calculating a confusion Matrix:
- You need a test dataset or a validation dataset with expected outcome values.
- Make a prediction for each row in your test dataset.
- From the expected outcomes and predictions count:
- The number of correct predictions for each class.
- The number of incorrect predictions for each class, organized by the class that was predicted.
These numbers are then organized into a table or a matrix as follows:
- Expected down the side: Each row of the matrix corresponds to a predicted class.
- Predicted across the top: Each column of the matrix corresponds to an actual class.
The counts of correct and incorrect classification are then filled into the table.
The total number of correct predictions for a class goes into the expected row for that class value and the predicted column for that class value.
In the same way, the total number of incorrect predictions for a class goes into the expected row for that class value and the predicted column for that class value.
2-Class Confusion Matrix Case Study
Let us consider we have a two-class classification problem of predicting whether a photograph contains a man or a woman. We have a test dataset of 10 records with expected outcomes and a set of predictions from our classification algorithm.
Expected |
Predicted |
---|---|
Man |
Woman |
Man |
Man |
Woman |
Woman |
Man |
Man |
Woman |
Man |
Woman |
Woman |
Woman |
Woman |
Man |
Man |
Man |
Woman |
Woman |
Woman |
Let’s start off and calculate the classification accuracy for this set of predictions.
Suppose the algorithm made 7 of the 10 predictions correct with an accuracy of 70%, then:
accuracy = total correct predictions / total predictions made * 100
accuracy = 7/10∗100
But what are the types of errors made?
We can determine that by turning our results into a confusion matrix:
First, we must calculate the number of correct predictions for each class.
- men classified as men: 3
- women classified as women: 4
Now, we can calculate the number of incorrect predictions for each class, organized by the predicted value:
- men classified as women: 2
- woman classified as men: 1
We can now arrange these values into the 2-class confusion matrix:
|
men |
women |
---|---|---|
men |
3 |
1 |
women |
2 |
4 |
From the above table we learn that:
- The total actual men in the dataset is the sum of the values on the men column.
- The total actual women in the dataset is the sum of values in the women's column.
- The correct values are organized in a diagonal line from top left to bottom-right of the matrix.
- More errors were made by predicting men as women than predicting women as men.
Two-Class Problems Are Special
In a two-class problem, we are often looking to discriminate between observations with a specific outcome, from normal observations. Such as a disease state or event from no-disease state or no-event. In this way, we can assign the event row as “positive” and the no-event row as “negative“. We can then assign the event column of predictions as “true” and the no-event as “false“.
This gives us:
- “true positive” for correctly predicted event values.
- “false positive” for incorrectly predicted event values.
- “true negative” for correctly predicted no-event values.
- “false negative” for incorrectly predicted no-event values.
We can summarize this in the confusion matrix as follows:
|
event |
no-event |
---|---|---|
men |
3 |
1 |
women |
2 |
4 |
This can help in calculating more advanced classification metrics such as precision, recall, specificity and sensitivity of our classifier.
Sensitivity/ recall= 7/ (7+5)= 0.583
Specificity= 3/ (3+5)= 0.375
Precision= 7/ (7+3)= 0.7
The code mentioned below shows the implementation of confusion matrix in Python with respect to the example used earlier:
# Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test,
predicted_df['Predicted_Class']).ravel()
confusion_matrix
array([ 37, 63, 15, 185])
The results from the confusion matrix are telling us that 37 and 185 are the number of correct predictions. 63 and 15 are the number of incorrect predictions.
Receiver Operating Characteristic (ROC)
The receiver operating characteristic (ROC), or the ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity or the sensitivity index d', known as "d-prime" in signal detection and biomedical informatics, or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as (1 - specificity). The ROC curve is thus the sensitivity as a function of fall-out.
There are a number of methods of evaluating whether a logistic model is a good model. One such way is sensitivity and specificity. Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function:
Sensitivity / Recall (also known as the true positive rate, or the recall) measures the proportion of actual positives which are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition), and is complementary to the false negative rate. It shows how good a test is at detecting the positives. A test can cheat and maximize this by always returning “positive”.
Sensitivity= true positives/ (true positive + false negative)
Specificity (also called the true negative rate) measures the proportion of negatives which are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition), and is complementary to the false positive rate. It shows how good a test is at avoiding false alarms. A test can cheat and maximize this by always returning “negative”.
Specificity= true negatives/ (true negative + false positives)
Precision is used as a measure to calculate the success of predicted values to the values which were supposed to be successful. Precision is used with recall, the percent of all relevant documents that is returned by the search. The two measures are sometimes used together in the F1 Score (or f-measure) to provide a single measurement for a system. It shows how many of the positively classified were relevant. A test can cheat and maximize this by only returning positive on one result it’s most confident in.
Precision= true positives/ (true positive + true negative)
The precision-recall curve shows the trade-off between precision and recall for different threshold. The decision for the value of the threshold value is majorly affected by the values of precision and recall. Ideally, we want both precision and recall to be 1, but this seldom is the case. In case of a Precision-Recall tradeoff we use the following arguments to decide upon the threshold:-
Low Precision/High Recall: In applications where we want to reduce the number of false negatives without necessarily reducing the number of false positives, we choose a decision value which has a low value of Precision or high value of Recall. For example, in a cancer diagnosis application, we do not want any affected patient to be classified as not affected without giving much heed to if the patient is being wrongfully diagnosed with cancer. This is because, the absence of cancer can be detected by further medical diseases but the presence of the disease cannot be detected in an already rejected candidate.
High Precision/Low Recall: In applications where we want to reduce the number of false positives without necessarily reducing the number of false negatives, we choose a decision value which has a high value of Precision or low value of Recall. For example, if we are classifying customers whether they will react positively or negatively to a personalised advertisement, we want to be absolutely sure that the customer will react positively to the advertisement because otherwise, a negative reaction can cause a loss of potential sales from the customer.
The code mentioned below shows the implementation in Python with respect to the example used earlier:
from sklearn.metrics import classification_report
print(classification_report(y_test, predicted_df['Predicted_Class']))
The f1-score tells you the accuracy of the classifier in classifying the data points in that particular class compared to all other classes. It is calculated by taking the harmonic mean of precision and recall. The support is the number of samples of the true response that lies in that class.
y_pred_prob = model1.predict(x_test)
from sklearn.metrics import roc_curve
# Generate ROC curve values: fpr, tpr, thresholds
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
# Plot ROC curve
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()
# AUCfrom sklearn.metrics import roc_auc_score
roc_auc_score(y_test,predicted_df['Predicted_Class'])
0.6475
Area Under the Curve is 0.6475
Hosmer Lemeshow Goodness-of-Fit
- It measures the association between actual events and predicted probability.
- How well our model fits depends on the difference between the model and the observed data. One approach for binary data is to implement a Hosmer Lemeshow goodness of fit test
- In HL test, the null hypothesis states, the model fits the data well. Model appears to fit well if we have no significant difference between the model and the observed data (i.e. the p-value > 0.05, so not rejecting the Ho)
- Or in other words, if the test is NOT statistically significant, that indicates the model is a good fit.
- As with all measures of model fit, use this as just one piece of information in deciding how well this model fits. It doesn’t work well in very large or very small data sets, but is often useful nonetheless.
n
G2HL = ∑ {[(Oj-Ej)2]/[Ej(1-Ej/nj)]} ~Xs2
j=1
Χ2 = chi squared.
nj = number of observations in the group.
Oj = number of observed cases in the j th group.
Oj = number of expected cases in the j th group.
Gini Coefficient
The Gini coefficient is sometimes used in classification problems.
Gini coefficient can be straight away derived from the AUC ROC number. Gini is nothing but the ratio between area between the ROC curve and the diagonal line & the area of the above triangle. Following is the formulae used :
Gini=2*AUC–1
Gini above 60% is a good model.
Akaike Information Criterion and Bayesian Information Criterion
AIC and BIC values are like adjusted R-squared values in linear regression.
AIC= -2ln(SSE)+ 2k
BIC = n*ln(SSE/n) + k*ln(n)
Pros and Cons of Logistic Regression
Many of the pros and cons of the linear regression model also apply to the logistic regression model. Although Logistic regression is used widely by many people for solving various types of problems, it fails to hold up its performance due to its various limitations and also other predictive models provide better predictive results.
Pros
The logistic regression model not only acts as a classification model, but also gives you probabilities. This is a big advantage over other models where they can only provide the final classification. Knowing that an instance has a 99% probability for a class compared to 51% makes a big difference. Logistic Regression performs well when the dataset is linearly separable.
Logistic Regression not only gives a measure of how relevant a predictor (coefficient size) is, but also its direction of association (positive or negative). We see that Logistic regression is easier to implement, interpret and very efficient to train.
Cons
Logistic regression can suffer from complete separation. If there is a feature that would perfectly separate the two classes, the logistic regression model can no longer be trained. This is because the weight for that feature would not converge, because the optimal weight would be infinite. This is really a bit unfortunate, because such a feature is really very useful. But you do not need machine learning if you have a simple rule that separates both classes. The problem of complete separation can be solved by introducing penalization of the weights or defining a prior probability distribution of weights.
Logistic regression is less prone to overfitting but it can overfit in high dimensional datasets and in that case, regularization techniques should be considered to avoid over-fitting in such scenarios.
In this article, we have seen what Logistic Regression is, how it works, when we should use it, comparison of Logistic and Linear Regression, the difference between the approach and usage of two estimation techniques: Maximum Likelihood Estimation and Ordinary Least Square Method, evaluation of model using Confusion Matrix and the advantages and disadvantages of Logistic Regression. We have also covered some basics of sigmoid function, cost function, and gradient descent.
Check out KnowledgeHut’s machine learning with python projects for hands-on application of regression and other data science concepts.
Frequently Asked Questions (FAQs)
1. How many kinds of Logistic Regression for Machine Learning are possible?
Logistic Regression is broadly of three types:
1. Binary
2. Multinomial
3. Ordinal.
2. What is Logistic Regression used for in Machine Learning?
Logistic Regression is one of the supervised learning methods used to find and build the best fit relationship between dependent and independent variables to make proper future predictions.
3. What is the function that Logistic Regression for Machine Learning uses?
Logistic Regression for Machine Learning uses the Sigmoid function to find the best fit curve.
4. What is Logistic Regression?
Logistic regression is a statistical analysis approach that predicts a data value based on previous observations from a data collection. The method enables a Machine Learning system to categorize input data based on past data. The system should improve its ability to anticipate classes within data sets as more relevant data is received. During the extract, transform and load process, logistic regression can help with data preparation by allowing data sets to be placed into specified buckets to stage data for analysis. A logistic regression model analyses the connection between one or more existing independent variables to determine a dependent data variable.
5. How does Logistic Regression help machine learners?
Under the Supervised Learning technique, the most well-known Machine Learning algorithm is logistic regression. In Machine Learning, a categorical dependent variable's output is predicted using logistic regression. So, the result of the program must be either categorical or discrete. It can be Yes/No, 0/1, true/false, etc., but instead of giving precise values, it provides probabilistic values that are between 0 and 1. As it can generate probabilities and classify new data using both continuous and discrete datasets, logistic regression is a key Machine Learning approach. Logistic regression may be used to categorize observations based on multiple forms of data and can determine the most beneficial elements for classification.
6. Is logistic regression a topic from Mathematics or Computer Science?
Machine Learning (ML) is a part of Data Science that lies at the confluence of Computer Science and Mathematics, with data-driven learning as its core. Supervised Learning, Unsupervised Learning, and Reinforcement Learning are the three subparts of Machine Learning, depending on the kind of learning. Supervised Learning is the most common and well-known of these learning styles. The two primary kinds of issues tackled in Supervised Learning are Classification and Regression. Classification is useful for categorizing data. You may further subdivide classification into generative and discriminative models. The most prevalent strategy in discriminative models is logistic regression.