Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
KnowledgeHut upGradKnowledgeHut upGradBackend Development Bootcamp
  • Self-Paced
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

Gradient Descent in Machine Learning: How Does it Work?

Updated on 23 September, 2022

6.25K+ views
9 min read

Introduction

One of the most crucial parts of Machine Learning is the optimization of its algorithms. Almost all the algorithms in Machine Learning have an optimization algorithm at their base which acts as the core of the algorithm. As we all know, optimization is the ultimate goal of any algorithm even with real-life events or when dealing with a technology-based product in the market.

There are currently a lot of optimization algorithms that are used in several applications such as face recognition, self-driving cars, market-based analysis, etc. Similarly, in Machine Learning such optimization algorithms play an important role. One such widely used optimization algorithm is the Gradient Descent Algorithm which we shall go through in this article.

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

What is Gradient Descent?

In Machine Learning, the Gradient Descent algorithm is one of the most used algorithms and yet it stupefies most newcomers. Mathematically, Gradient Descent is a first-order iterative optimization algorithm that is used to find the local minimum of a differentiable function. In simple terms, this Gradient Descent algorithm is used to find the values of a function’s parameters (or coefficients) which are used to minimize a cost function as low as possible. The cost function is used to quantify the error between the predicted values and the real values of a Machine Learning model built.

Gradient Descent Intuition

Consider a large bowl with which you would normally keep fruits or eat cereal. This bowl will be the cost function (f).

Now, a random co-ordinate on any part of the surface of the bowl will be the current values of the coefficients of the cost function. The bottom of the bowl is the best set of coefficients and it is the minimum of the function.

Here, the goal is to calculate the different values of the coefficients with each iteration, evaluate the cost and choose the coefficients which have a better cost function value (lower value). On multiple iterations, it would be found that the bottom of the bowl has the best coefficients to minimize the cost function.

In this way, the Gradient Descent algorithm functions to result in minimum cost.

Gradient Descent Procedure

This process of gradient descent begins with allocating values initially to the coefficients of the cost function. This could be either a value close to 0 or a small random value.

coefficient = 0.0

Next, the cost of the coefficients is obtained by applying it to the cost function and calculating the cost.

cost = f(coefficient)

Then, the derivative of the cost function is calculated. This derivative of the cost function is obtained by the mathematical concept of differential calculus. It gives us the slope of the function at the given point where its derivative is calculated. This slope is needed to know in which direction the coefficient is to be moved in the next iteration to get a lower cost value. This is done by observing the sign of the derivative calculated.

delta = derivative(cost)

Once we know which direction is downhill from the derivative calculated, we need to update the coefficient values. For this, a parameter is known as the learning parameter, alpha (α) is utilized. This is used to control to what extent the coefficients can change with every update.

coefficient = coefficient – (alpha * delta)

Source

In this way, this process is repeated till the cost of the coefficients is equal to 0.0 or close enough to zero. This is the procedure for the gradient descent algorithm.

Types of Gradient Descent Algorithms

In modern times, there are three basic types of Gradient Descent that are used in modern machine learning and deep learning algorithms. The major difference between each of these 3 types is its computational cost and efficiency. Depending upon the amount of data used, time complexity, and accuracy the following are the three types.

  1. Batch Gradient Descent
  2. Stochastic Gradient Descent
  3. Mini Batch Gradient Descent

Batch Gradient Descent

This is the first and basic version of the Gradient Descent algorithms in which the entire dataset is used at once to compute the cost function and its gradient. As the entire dataset is used in one go for a single update, the calculation of the gradient in this type can be very slow and is not possible with those datasets that are out of the device’s memory capacity.

Thus, this Batch Gradient Descent algorithm is used only for smaller datasets and when the number of training examples is large, the batch gradient descent is not preferred. Instead, the Stochastic and Mini Batch Gradient Descent algorithms are used.

Stochastic Gradient Descent

This is another type of gradient descent algorithm in which only one training example is processed per iteration. In this, the first step is to randomize the entire training dataset. Then, only one training example is used for updating the coefficients. This is in contrast to the Batch Gradient Descent in which the parameters (coefficients) are updated only when all the training examples are evaluated.

Stochastic Gradient Descent (SGD) has the advantage that this type of frequent update gives a detailed rate of improvement. However, in certain cases, this may turn out to be computationally expensive as it processes only one example every iteration which may cause the number of iterations to be very large.

Mini Batch Gradient Descent

This is a recently developed algorithm that is faster than both the Batch and Stochastic Gradient Descent algorithms. It is mostly preferred as it is a combination of both the previously mentioned algorithms. In this, it separates the training set into several mini-batches and performs an update for each of these batches after calculating the gradient of that batch (like in SGD).

Commonly, the batch size varies between 30 to 500 but there isn’t any fixed size as they vary for different applications. Hence, even if there is a huge training dataset, this algorithm processes it in ‘b’ mini-batches. Thus, it is suitable for large datasets with a lesser number of iterations.

If ‘m’ is the number of training examples, then if b==m the Mini Batch Gradient Descent will be similar to the Batch Gradient Descent algorithm.

Variants of Gradient Descent in Machine Learning

With this basis for Gradient Descent, there have been several other algorithms that have been developed from this. A few of them are summarized below.

Vanilla Gradient Descent

This is one of the simplest forms of the Gradient Descent Technique. The name vanilla means pure or without any adulteration. In this, small steps are taken in the direction of the minima by calculating the gradient of the cost function. Similar to the above-mentioned algorithm, the update rule is given by,

coefficient = coefficient – (alpha * delta)

Gradient Descent with Momentum

In this case, the algorithm is such that we know the previous steps before taking the next step. This is done by introducing a new term which is the product of the previous update and a constant known as the momentum. In this, the weight update rule is given by,

update = alpha * delta

velocity = previous_update * momentum

coefficient = coefficient + velocity – update

ADAGRAD

The term ADAGRAD stands for Adaptive Gradient Algorithm. As the name says, it uses an adaptive technique to update the weights. This algorithm is more suited for sparse data. This optimization changes its learning rates in relation to the frequency of the parameter updates during the training. For example, the parameters which have higher gradients are made to have a slower learning rate so that we do not end up overshooting the minimum value. Similarly, lower gradients have a faster learning rate to get trained more quickly.

ADAM 

Yet another adaptive optimization algorithm that has its roots in the Gradient Descent algorithm is the ADAM which stands for Adaptive Moment Estimation. It is a combination of both the ADAGRAD and the SGD with Momentum algorithms. It is built from the ADAGRAD algorithm and is built further downside. In simple terms ADAM = ADAGRAD + Momentum.

In this way, there are several other variants of Gradient Descent Algorithms that have been developed and are being developed in the world such as AMSGrad, ADAMax.

Conclusion

In this article, we have seen the algorithm behind one of the most commonly used optimization algorithms in Machine Learning, the Gradient Descent Algorithms along with its types and variants that have been developed.

upGrad provides a Executive PG Programme in Machine Learning & AI and a  Master of Science in Machine Learning & AI that may guide you toward building a career. These courses will explain the need for Machine Learning and further steps to gather knowledge in this domain covering varied concepts ranging from Gradient Descent in Machine Learning.

Frequently Asked Questions (FAQs)

1. Where can Gradient Descent Algorithm contribute maximally?

Optimisation within any machine learning algorithm is incremental to the purity of the algorithm. Gradient Descent Algorithm assists in minimising cost function errors and improving the algorithm’s parameters. Although the Gradient Descent algorithm is used widely in Machine Learning and Deep Learning, its effectiveness can be determined by the quantity of data, amount of iterations and accuracy preferred, and amount of time available. For small-scale datasets, the Batch Gradient Descent is optimal. Stochastic Gradient Descent (SGD) proves to be more efficient for detailed and more extensive data sets. In contrast, Mini Batch Gradient Descent is used for quicker optimisation.

2. What are the challenges faced in gradient descent?

Gradient Descent is preferred to optimise machine learning models to reduce cost function. However, it has its shortcomings as well. Suppose the Gradient is diminished due to the minimum output functions of the model layers. In that case, the iterations won’t be as effective as the model will not retrain fully, updating its weights and biases. Sometimes an error gradient accumulates loads of weights and biases to keep the iterations updated. However, this gradient becomes too large to manage and is called an exploding gradient. The infrastructure requirements, learning rate balance, momentum need to be addressed.

3. Does gradient descent always converge?

Convergence is when the gradient descent algorithm successfully minimises its cost function to an optimal level. Gradient Descent Algorithm tries to minimise the cost function through the algorithm parameters. However, it can land on any of the optimal points and not necessarily the one that has a global or local optimum point. One reason for not having optimal convergence is the step size. A more significant step size results in more oscillations and may divert from the global optimal. Hence, gradient descent may not always converge on the best feature, but it still lands on the nearest feature point.