Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

Q Learning in Python: What is it, Definitions [Coding Examples]

Updated on 17 January, 2024

6.26K+ views
7 min read

Reinforcement learning is when a learning agent learns to behave optimally according to its environment through constant interactions. The agent goes through various situations, which are also known as states. As you would’ve guessed, reinforcement learning has many applications in our world. Learn more if you are interested to learn more about data science algorithms.

Also, it has many algorithms, among the most popular ones is Q learning. In this article, we’ll be discussing what this algorithm is and how it works.

So, without further ado, let’s get started. 

What is Q Learning?

Q learning is a reinforcement learning algorithm, and it focuses on finding the best course of action for a particular situation. It’s off policy because the actions the Q learning function learns from are outside the existing policy, so it doesn’t require one. It focuses on learning a policy that increases its total reward. It’s a simple form of reinforcement learning that uses action values (or Q-values) to enhance the learning agent’s behaviour. 

Q learning is one of the most popular algorithms in reinforcement learning, as it’s effortless to understand and implement. The ‘Q’ in Q learning represents quality. As we mentioned earlier, Q learning focuses on finding the best action for a particular situation. And the quality shows how useful a specific action is and what reward it can help you in reaching. 

Important Definitions

Before we begin discussing how it works, we should first take a look at some essential concepts of q learning. Let’s get started.

Q-Values

Q-values are also known as Action-values. They are represented by Q(S, A), and they give you an estimate of how good the action A is to take at the state S. The model will compute this estimation iteratively by using the Temporal Difference Update rule we’ve discussed later in this section. 

Also, Check out all trending Python tutorial concepts in 2024.

Episodes and Rewards

An agent begins from a start state, goes through several transitions, and then moves from its current state to the next one according to its actions and its environment. Whenever the agent takes action, it gets some reward. And when there are no transitions possible, it’s the completion of the episode. 

TD-Update (Temporal Difference)

Here’s the TD-Update or Temporal Difference rule:

Q(S,A) Q(S,A) + (R +Q(S’,A’)-Q(S,A))

Here, S represents the agent’s current state, whereas S’ represents the next state. A represents the current action, A’ represents the following best action according to the Q-value estimation, R shows the current reward according to the present action, stands for the discounting factor, and shows the step length. 

Also read: Prerequisite for Data Science. How does it change over time?

Example of Q Learning Python

The best way to understand Q learning Python is to see an example. In this example, we are using the gym environment of OpenAI and train our model with it. First off, you’ll have to install the environment. You can do so with the following command:

pip install gym

Now, we’ll import the libraries we’ll need for this example:

import gym

import itertools

import matplotlib

import matplotlib.style

import numpy as np

import pandas as pd

import sys

from collections import defaultdict

from windy_gridworld import WindyGridworldEnv

import plotting

matplotlib.style.use(‘ggplot’)

Without the necessary libraries, you wouldn’t be able to perform these operations successfully. After we’ve imported the libraries, we will create the environment:

env = WindyGridworldEnv() 

Now we’ll create the -greedy policy:

def createEpsilonGreedyPolicy(Q, epsilon, num_actions):

    “””

    Creates an epsilon-greedy policy based

    on a given Q-function and epsilon.

    Returns a function that takes the state

    as an input and returns the probabilities

    for each action in the form of a numpy array 

    of the length of the action space(set of possible responses).

    “””

    def policyFunction(state):

        Action_probabilities = np.ones(num_actions,

                dtype = float) * epsilon / num_actions

        best_action = np.argmax(Q[state])

        Action_probabilities[best_action] += (1.0 – epsilon)

        return Action_probabilities

    return policyFunction

Here’s the code for building a q-learning model:

def qLearning(env, num_episodes, discount_factor = 1.0,

                            alpha = 0.6, epsilon = 0.1):

    “””

    Q-Learning algorithm: Off-policy TD control.

    Finds the optimal greedy policy while improving

    following an epsilon-greedy policy”””

    # Action value function

    # A nested dictionary that maps

    # state -> (action -> action-value).

    Q = defaultdict(lambda: np.zeros(env.action_space.n))

    # Keeps track of useful statistics

    stats = plotting.EpisodeStats(

        episode_lengths = np.zeros(num_episodes),

        episode_rewards = np.zeros(num_episodes))   

    # Create an epsilon greedy policy function

    # appropriately for environment action space

    policy = createEpsilonGreedyPolicy(Q, epsilon, env.action_space.n)

    # For every episode

    for ith_episode in range(num_episodes):

        # Reset the environment and pick the first action

        state = env.reset()

        for t in itertools.count():

            # get probabilities of all actions from current state

            action_probabilities = policy(state)

            # choose action according to 

            # the probability distribution

            action = np.random.choice(np.arange(

                      len(action_probabilities)),

                       p = action_probabilities)

            # take action and get reward, transit to next state

            next_state, reward, done, _ = env.step(action)

            # Update statistics

            stats.episode_rewards[i_episode] += reward

            stats.episode_lengths[i_episode] = t

            # TD Update

            best_next_action = np.argmax(Q[next_state])

            td_target = reward + discount_factor * Q[next_state][best_next_action]

            td_delta = td_target – Q[state][action]

            Q[state][action] += alpha * td_delta

            # done is True if episode terminated

            if done:

                break

            state = next_state

    return Q, stats

Let’s train the model now:

Q, stats = qLearning(env, 1000)

After we’ve created and trained the model, we can plot the essential stats of the same:

plotting.plot_episode_stats(stats)

Use this code to run the model and plot the graph. What kind of results do you see? Share your results with us, and if you face any confusion or doubts, let us know. 

Also read: Machine Learning Algorithms for Data Science

Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on How to Build Digital & Data Mindset?

Final Thoughts

When you plot the graph, you’ll see that the reward per episode increases progressively over time. And after certain episodes, the plot also reflects that it levels out the high reward limit per episode. What does this indicate? 

It means your model has learned to increase the total reward it can earn in an episode by ensuring that it behaves optimally. You must’ve also seen why q learning Python sees applications in so many industries and areas. 

Frequently Asked Questions (FAQs)

1. What are the drawbacks of reinforcement learning?

1. Excessive reinforcement learning might result in an excess of states, lowering the quality of the outcomes.
2. Reinforcement learning is not recommended for easy problem solving.
3.Reinforcement learning necessitates a large amount of data and computation.
4. Reinforcement learning has its own set of unique and very complicated obstacles, such as challenging training design setup and issues with the balance of exploration and reinforcement.

2. Is Q learning model-based?

No, Q learning isn't dependent on models. Q-learning is a model-free reinforcement learning technique for determining the worth of a certain action in a given state. Q learning is one of several current reinforcement learning algorithms that is model-free, meaning it may be used in a variety of contexts and can quickly adapt to new and unknown conditions. It can handle issues involving stochastic transitions and rewards without the requirement for adaptations and does not require an environment model. Q-learning is a learning algorithm that is based on values. Value-based algorithms use an equation to update the value function (particularly Bellman equation).

3. How are Q learning and SARSA different from each other?

SARSA learns a near-optimal policy while exploring, whereas Q-learning learns the optimal policy directly. Off-policy SARSA learns action values in relation to the policy it is following, whereas on-policy SARSA learns action values in relation to the policy it is following. In relation to the greedy policy, Q-Learning does it. They both converge to the real value function under some similar conditions, but at different speeds. Q-Learning takes a little longer to converge, but it may continue to learn while regulations are changed. When coupled with linear approximation, Q-Learning is not guaranteed to converge. SARSA will consider penalties from exploratory steps when approaching convergence, while Q-learning will not. If there's a chance of a significant negative reward along the ideal path, Q-learning will try to trigger it while exploring, however SARSA will try to avoid a risky optimal path and only learn to utilize it after the exploration parameters are decreased.