The logistic regression algorithm is nothing but a standard classification algorithm that works based on the categorical dependent variables. Typical examples of logistic regression algorithms can include classifying an email into spam or not spam and predicting whether or not a patient has symptoms of cancer.
The classification problem is a controlled problem wherein the target variable is definite. For this purpose, the logistic regression function and the name are used. This algorithm works as a regression problem, although it performs classification. The reason is rather than offering the class, logistic regression informs us about the likelihood of a data point relating to every class. Essentially, logistic regression functions as a simple logistic regression classification algorithm.
Let’s take a detailed example to understand the classification problem better. Suppose the cancer is benign or malignant is a classification problem. This example classifies outcomes into various classes. So, the result falls into either of the two classes - benign or malignant.
Another example is whether a customer will default on their loan or not can be a classification problem. This problem is of high interest to those companies capitalizing on finance-related issues. Different algorithms for unsupervised learning are Hierarchical clustering, k means clustering, and Neural Network (logistic regression with a neural network mindset).
Logistic Regression
Logistic Regression is a popular classification method with application in machine learning. The logistic regression algorithm in machine learning adopts a logistic function to exemplify the dependent variable.
The logistic regression definition in machine learning states that the dependent variable has a dichotomous nature. So, there can be only two probable classes. In the classification problem discussed above, the classes can be benign or malignant. So, this basic logistic regression technique is beneficial when dealing with binary data.
Loss Function
Linear regression algorithm uses the 'Mean Squared Error’ loss function.’
MSE = 1/n ∑ (y –ỹ)2
Here, y-ỹ shows the difference between actual and predicted values.
The MSE adds the square of the distance between the real and the predicted outcome value for all input samples. After adding, it divides it by the number of input samples. The use of the MSE error function offers accurate results.
The reason for squaring the distance between the real and the predicted outcome values is to correct the samples whose predicted value is quite far from the real value compared to those whose predicted value is near the actual value.
Cost Function
The cost function for logistic regression is a mathematical formula for measuring the error between the expected and predicted values. A logistic regression cost function shows the amount of how incorrect the model is in its capability to assess the relationship between x and y. The cost function returns a value. This value is named ‘cost’ or ‘loss’ or ‘error’.
The logistic regression cost function is shown with the following equation:
Cost (hΘ(x), Y(actual)) = -log (hΘ (x)) if y=1
= -log(1-hΘ (x)) if y=0
Here, the negative function represents the ability to maximise the likelihood by minimising the loss function. This happens when we train the logistic regression. Reducing the cost will raise the maximum likelihood, based on the assumption - samples are derived from an indistinguishable independent distribution.
Bias-Variance tradeoff
Bias-Variance tradeoff is the model outfitting the training data inefficiently but capable of generating identical results in data exterior to training data. In simple terms, it implies simple models that forecast very far from reality without having significant changes in each dataset.
For instance, a linear regression model will show high bias when attempting to model a non-linear relationship. The reason is the linear regression model doesn’t properly fit non-linear relationships.
High bias suggests linear regression implemented in the quadratic relationship.
Low bias suggests second-degree polynomials implemented in quadratic data.
There is a trade-off between generalization of pattern and predictive accuracy outside training data, hence the name. Enhancing the accuracy of the model leads to less generalization of pattern exterior to the training data. The bias is inversely proportional to the variance.
Gradient Descent
Gradient Descent is an optimization algorithm useful for finding the local or global minima of a differentiable function. This algorithm aims to optimize the value for bias and weight. Therefore, the process of computing the gradients and updating the bias and weight is repeated several times.
Before thoroughly understanding Gradient Descent, let’s go through certain assumptions and definitions:
Suppose we are having an independent variable x and a dependent variable y.
This equation establishes a relationship between these variables:
y = x * w + b
here, w: weight (or slope),
b: bias (or intercept),
x: independent variable column vector
y: dependent variable column vector
Loss Function: This function denotes the amount of deviation of predicted values from the actual values of dependent variables.
The key goal behind the implementation of Gradient Descent is to find b and w for establishing the relationship between x and y variables. This can be accomplished using the Loss Function.
Steps for implementation of Gradient Descent in Linear Regression:
Step-1: Initialize the bias and weight randomly or with 0
Step-2: Make logistic regression prediction using the above values of initial bias and weight.
Step-3: Now compare actual values with the above-predicted values. Define the loss function with the help of both these values.
Step-4: Based on the differentiation, you need to measure how loss function modifies according to bias and weight.
Step-5: Update the bias and weight to diminish the loss function.
Root Mean Squared error
Root Mean Square Error (RMSE) denotes how efficiently a regression line fits the data points. RMSE can be interpreted as Standard Deviation in residuals. The problem with MSE is that the loss order is higher than the data’s order. Therefore, the corresponding formula takes the root of MSE.
Its formula is:
L = √1/N [∑ (Ỹ-Y)2]
Note: The loss function is unchanged and still the solution is the same. The square root reduces the order of the loss function.
Recall, Precision, and F1 score:
To understand Recall, Precision, and F1 score, let’s suppose that we are attempting to assist a rescue shelter for goats and sheep. This rescue shelter has a survey for people who don’t know whether they need a goat or a sheep. Depending on the result of the survey, the appointed agent will determine which animal the person likes more. The shelter aims to have fewer people returning to their adopted pets.
Precision:
Precision= tp/(tp + fp)
When this function predicts the person likes goats, the frequency of being correct can be calculated from the F1 score,
F1 = (2/(recall-1 + precision-1)) = 2 * ((precision * recall)/ (precision + recall))
The weighted logistic regression, i.e. the weighted average between recall and precision, is useful when there are unbalanced samples.
Accuracy = (tp + tn) / (tp + tn+ fp + fn)
It is the sum of true negatives and true positives divided by the number of samples. It is only accurate when the model is balanced, else it gives wrong results.
Application
i. For heart disease prediction:
Medical researchers intend to know the way weight and exercise influence the probability of heart attack. To comprehend the relationship between the probability of having a heart attack and the predictor variables, medical researchers can use logistic regression. In the heart disease prediction using logistic regression, the response variable in the model would be a heart attack.
Two probable outcomes: a heart attack occurs or doesn’t occur. The results of the heart disease prediction using logistic regression will inform researchers how modifications in weight and exercise influence the probability that a person has a heart attack.
ii. For email spam detection:
A business intends to know whether the country of origin and word count influence the probability that an email is detected as spam or not. Researchers can carry out logistic regression to infer the relationship between the probability of an email being spam and these two predictor variables.
In the model, the response variable will be spam. Two potential outcomes are the email is spam and the email is not spam. The outcome of the model informs the business how modifications in the country of origin and word count influence the likelihood of a given email being spam or not.
iii. Credit card fraud detection:
Suppose a credit card company intends to know whether credit score and transaction amount influence the probability of a particular transaction is a fraud or not. Credit card fraud detection using logistic regression helps the company to infer the relationship between the probability of a transaction being fraudulent and the relationship between these predictor variables.
In the model, the response variable will be ‘fraudulent’. It can have any one of the two probable outcomes i.e. the transaction is fraudulent or it's not fraudulent. Based on this outcome of the model, the company can determine how modifications in the credit score and transaction amount can relate to the probability of whether a particular transaction is fraudulent or not.