View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Markov Chains in Python: Mathematical Foundations, Types and Implementation

By Rohit Sharma

Updated on Jul 11, 2025 | 9 min read | 18.93K+ views

Share:

Did you know?

You can simulate a million-step Markov Chain in Python in just 20 milliseconds using the QuantEcon library, over 60 times faster than a typical homemade implementation!

Markov chains in Python are used to model systems that move from one state to another, where the next state depends only on the current one, not the full history. Think of predicting tomorrow’s weather based on today’s.

But many tutorials skip the math or rush through the code, leaving you stuck between theory and practice. In this guide, you'll learn how to actually build and apply Markov chains in Python with clear, working examples.

Build your AI and machine learning skills with upGrad’s online machine learning coursesLearn Markov chains, probabilistic modeling, and much more. Take the next step in your learning journey! 

What is a Markov Chain? Types and Examples

You're texting someone, and your phone’s autocomplete kicks in: “I’m going to the…” and it suggests “store.” That’s a Markov chain at work. Your phone predicts the next word based on the one you just typed, not the entire sentence.

Working with Markov chains in Python isn’t just about setting up transitions. You need the right structure, validation steps, and checks to build reliable models. Here are three programs that can help you:

Markov chains in Python are all about probability. You have a set of states and the chance of moving from one to another. The math behind it? Transition matrices, state vectors, and the idea that the future depends only on the present. 

Let’s break that down properly.

1. States

state is one possible condition or outcome in your system. Think “Sunny,” “Rainy,” or “Cloudy” if you’re modeling weather. These states are often labeled as integers or strings in your program. You can have any number of them, but you’ll usually work with a finite set.

Example: 

states = ["Sunny", "Cloudy", "Rainy"]

2. Transitions

transition is the move from one state to another. The key idea in a Markov chain is that the next state depends only on the current one. Not what happened two steps back. Just the present. That’s called the Markov property.

This makes things much easier to model, especially with code.

3. Transition Matrix

Now to the math. The transition matrix holds the probabilities of moving from one state to another. Each row represents a current state. Each column shows where you might go next.

The sum of every row should equal 1. You’re distributing 100% of the probability across all possible next moves.

Example: 

import numpy as np

# Rows: current state, Columns: next state
transition_matrix = np.array([    [0.6, 0.3, 0.1],  # From Sunny
    [0.4, 0.4, 0.2],  # From Cloudy
    [0.2, 0.5, 0.3],  # From Rainy
])

If the sum of a row isn’t 1, your chain won’t work right. This is a common mistake, especially when building the matrix from raw data.

4. Initial State Vector

This is your starting point. It tells the model where it begins. If your model always starts with “Sunny,” the vector might look like this: 

initial_state = np.array([1, 0, 0])

This means 100% chance of starting in the Sunny state.

You can multiply this with the transition matrix to get the next state probabilities: 

next_state = np.dot(initial_state, transition_matrix)

You keep repeating this multiplication to simulate more steps.

5. Markov Property

This is the rule that holds the whole model together: 

P(next state | current state) = P(next state | current and previous states)

The future only depends on now. This makes modeling simpler, and also limits how much memory or historical data your program needs.

 

If you want to build your AI skills and apply them to Markov chains, behavioral modeling, and sequential prediction, enroll in upGrad’s  DBA in Emerging Technologies with Concentration in Generative AI. Grasp the techniques behind intelligent, data-driven applications. Start today!

Now that you understand the math, let’s take a look at the different types of Markov chains. 

Types of Markov Chains in Python

You might be wondering why your simulation doesn't behave as expected. Maybe it keeps getting stuck in a state. Maybe it never stabilizes. 

That’s usually because the type of Markov chain you’re working with behaves differently. So before you write more code, you need to know what kind of chain you’re building.

Let’s break them down.

1. Discrete-Time vs Continuous-Time

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Most tutorials focus on discrete-time Markov chains. This means time moves in fixed steps, like ticks on a clock.

  • Discrete-time

State changes happen at each step. Think: 1 second, 2 seconds, 3 seconds...

Uses a transition probability matrix P. You compute:

P ( X n + 1 = j | X n = i )

Where Xn is the state at step n.

  • Continuous-time

State changes can happen at any moment, not just fixed steps. These are used more in physics, queueing models, or advanced simulations.

Uses a rate matrix or generator matrix Q. Transitions are governed by time-dependent probabilities:

d d t P ( t ) = P ( t ) Q

This introduces differential equations and often needs more complex tools to solve.

If you're working with Python and want to simulate behavior step by step, stick with discrete-time for now.

If your model needs to simulate real-time events or time gaps between transitions, only then consider continuous-time, but expect a steeper learning curve and more math.

Also Read: Linear Algebra for Machine Learning: Critical Concepts, Why Learn Before ML

2. Finite vs Infinite State Space

One of the first things that can throw you off while reading about Markov chains in Python is the term state space. It sounds abstract, but it’s just the set of all possible states your system can be in.

The size of that state space changes how you build your model, and how you represent it in code. If you try to build a transition matrix without knowing whether your states are finite or not, you’ll either run into memory issues or end up with an incomplete model.

Let’s break it down clearly. 

  • Finite State Space

In a finite state space, the number of possible states is countable and fixed.

Let’s say you’re modeling weather with 3 possible conditions: Sunny, Cloudy, Rainy. That’s a finite set:

S = { s 1 , s 2 , s 3 } = { Sunny ,   Cloudy ,   Rainy }

You can represent the transition probabilities using a matrix P, where Pij​ is the probability of moving from state i to state j:

P = P 11 P 12 P 13 P 21 P 22 P 23 P 31 P 32 P 33

Each row sums to 1. This is easy to implement using NumPy in Python.

You’ll be using this kind of structure for:

  • Text prediction
  • User behavior modeling
  • Game state simulations
  • Basic recommendation systems

Because the number of states is known, storing the matrix and simulating transitions is straightforward and memory-safe.

Also Read: Pygame Tutorial: A Complete Step-by-Step Guide for Aspiring Game Developers

  • Infinite State Space

An infinite state space means the number of possible states isn’t fixed or easily countable. You can’t use a transition matrix here because it would be infinitely large.

Mathematically:

S = { s 1 , s 2 , s 3 , . . . }

You’ll often see this in models involving continuous values or processes that don’t have natural bounds, like modeling queues with no upper limit or stochastic processes in physics.

Since you can’t use a matrix, you’d need to rely on more complex data structures or transition functions. This gets abstract and memory-heavy fast. 

Most Python libraries won’t support this directly without building a lot of logic yourself.

Also Read: Comprehensive Guide to Hypothesis in Machine Learning: Key Concepts, Testing and Best Practices

3. Irreducible Chains

You set up your transition matrix, run the simulation, and notice that certain states never show up, no matter how many steps you take. That’s not a bug. That’s a reducible chain.

An irreducible Markov chain is one where every state can eventually reach every other state. It might take multiple steps, but the path exists. This matters because, without irreducibility, your chain might get stuck in a corner of the state space and never leave. 

A Markov chain is irreducible if, for every pair of states i and j, there exists some n ≥ 0  such that:

P n ( i , j ) > 0

This means the probability of reaching statefrom state i in n steps is greater than zero.

If no such n exists for even one pair, the chain is reducible.

Let’s say you have the following transition matrix: 

import numpy as np

P = np.array([    [0.5, 0.5, 0.0],  # State A
    [0.3, 0.7, 0.0],  # State B
    [0.0, 0.0, 1.0]   # State C (absorbing and isolated)
])

Here, State C is never reached from A or B. Once entered, it can’t be left. This chain is reducible. Simulations will never include State C unless you start there. 

How to Check for Irreducibility

  • Convert the matrix to a graph and check if it’s strongly connected.
  • Use networkx to visualize and test connectivity.
  • For small matrices, manually inspect if all states are reachable.

Example using networkx: 

import networkx as nx

G = nx.DiGraph()

for i in range(P.shape[0]):
    for j in range(P.shape[1]):
        if P[i, j] > 0:
            G.add_edge(i, j)

print(nx.is_strongly_connected(G))  # Returns False for reducible, True for irreducible

Why It Matters

If your chain isn’t irreducible, some states will never be visited unless you explicitly start in them. This affects:

  • Long-term predictions
  • Stationary distributions
  • Simulation reliability

4. Periodic vs Aperiodic

Say your simulation keeps bouncing between two states, over and over, never settling. You didn’t mess up the code. That’s a sign your chain is periodic.

The period of a state is the number of steps it takes to return to itself, and it must always be the same. If a state can only return in 2, 4, 6... steps, its period is 2. 

If a state can return at irregular intervals, 2, 3, 4 steps, it’s aperiodic.

Aperiodic chains are what you want when you're trying to reach a steady state. If your chain is periodic, your simulation may keep cycling instead of converging.  

The period of state i is:

d ( i ) = g c d { n 1 : P n ( i , i ) > 0 }

If d(i)=1, the state is aperiodic. If d(i)>1, it's periodic.

For a chain to be considered aperiodic, all states must be aperiodic (in an irreducible chain).

What can you do about it? 

If your chain is periodic and that’s not intended, add a small probability of staying in the same state. That breaks the cycle and makes the chain aperiodic. For example: 

# Original: Only alternates between A and B
P = np.array([    [0.0, 1.0],
    [1.0, 0.0]
])

# Modified: Adds probability of staying
P = np.array([    [0.1, 0.9],
    [0.9, 0.1]
])

Even a small tweak like this can fix convergence issues without changing your model’s overall behavior too much.

5. Absorbing Chains

When your simulation hits a state and just stops changing, you’re probably dealing with an absorbing chain. These chains are built to end.

An absorbing state is one where the probability of staying is 1, and there’s no way out. Once the process enters that state, it stays there forever.

You’ll run into these often in models with outcomes like failure, success, exit, or death, basically anything with a point of no return.  

A state i is absorbing if:

P ( i , i ) = 1   and   P ( i , j ) = 0   for   j i

For example: 

import numpy as np

P = np.array([    [0.5, 0.5, 0.0],
    [0.3, 0.4, 0.3],
    [0.0, 0.0, 1.0]  # Absorbing state
])

State 2 is absorbing. It traps the process once entered.

Use Cases

  • Board games (Game Over)
  • Credit risk models (defaulted state)
  • Customer churn (once lost, they don’t return)
  • Decision-making models with terminal outcomes

What to Watch Out For

If your model ends too early or always settles in the same place, check for hidden absorbing states. These can sneak in when your transition matrix isn’t carefully constructed, especially if you're working with sparse or imbalanced data.

Absorbing chains aren't broken; they're purposeful. Just make sure you're using them where they make sense.

Struggling to choose the right machine learning technique for your project? Check out upGrad’s Executive Programme in Generative AI for Leaders, where you’ll explore essential topics like predictive modeling, data calibration, and much more. Start today!

Let’s put everything together with a real-life example model user behavior in a subscription service using Markov chains in Python.

upGrad’s Exclusive Data Science Webinar for you –

ODE Thought Leadership Presentation

 

Example Implementation: Predicting User Behavior with Markov Chains

Markov chains in Python aren’t just for toy problems or textbook exercises. They can be useful in modeling how users interact with digital products over time. 

Let’s walk through a hands-on implementation of a Markov chain model that predicts how users behave in a subscription-based service. This is a common setup for platforms like streaming apps, SaaS tools, or learning platforms, where users move through different stages of engagement.

Instead of just simulating coin tosses or weather, we’ll model transitions between meaningful states:

  • Browsing: Users exploring your product but haven’t subscribed yet
  • Subscribed: Active paying users
  • Paused: Temporarily inactive or on hold
  • Churned: Users who’ve left and won’t return

This is a finite state space, which makes it easier to represent in code using a transition matrix. It also reflects real business concerns like churn, re-engagement, and customer retention. 

You’ll simulate how a population of users shifts between these states over time and visualize how the distribution stabilizes. 

Step 1: Define the States and Transition Matrix

Start by setting up your states and how users move between them. We'll assume this behavior is based on historical data or business logic. 

import numpy as np

states = ["Browsing", "Subscribed", "Paused", "Churned"]

# Each row represents the current state
# Each column represents the next state
transition_matrix = np.array([    [0.4, 0.4, 0.1, 0.1],  # From Browsing
    [0.0, 0.7, 0.2, 0.1],  # From Subscribed
    [0.1, 0.5, 0.3, 0.1],  # From Paused
    [0.0, 0.0, 0.0, 1.0]   # From Churned (absorbing state)
])

This matrix says, for example, that a user who is currently “Subscribed” has a 70% chance of staying subscribed, a 20% chance of pausing, and a 10% chance of churning in the next step. 

Step 2: Define the Initial User Distribution

Now, simulate an initial distribution of users. Let’s say you’re starting with 1000 users:

  • 700 are browsing
  • 200 are already subscribed
  • 50 are paused
  • 50 have already churned

Express this as a probability vector: 

initial_state = np.array([0.7, 0.2, 0.05, 0.05])

This vector should always sum to 1. 

Step 3: Simulate the Markov Chain Over Time

We'll write a function to simulate how this distribution evolves over a series of time steps. 

def simulate_markov_chain(P, state_vector, steps):
    history = [state_vector]
    current = state_vector.copy()
    
    for _ in range(steps):
        current = np.dot(current, P)
        history.append(current)
    
    return np.array(history)

history = simulate_markov_chain(transition_matrix, initial_state, steps=20)

Each iteration represents a new time period, say, a week or a month.

Step 4: Visualize the Results

Let’s see how the user states evolve over time. You’ll use matplotlib to plot the changes. 

import matplotlib.pyplot as plt

weeks = list(range(21))

for i, state in enumerate(states):
    plt.plot(weeks, history[:, i], label=state)

plt.xlabel("Weeks")
plt.ylabel("Proportion of Users")
plt.title("User Behavior Over Time")
plt.legend()
plt.grid(True)
plt.show()

Output:

This graph helps you answer questions like:

  • How quickly do users churn?
  • Do users tend to stabilize in the “Subscribed” or “Paused” state?
  • What percentage eventually move into the “Churned” state?

Explanation:

  • States Defined: Browsing, Subscribed, Paused, Churned represent different user stages in a subscription service.
  • Transition Matrix: Probabilities of moving between states. For example, a “Subscribed” user has a 70% chance to stay subscribed, 20% to pause, and 10% to churn.
  • Initial Distribution: 70% Browsing, 20% Subscribed, 5% Paused, 5% Churned.
  • Simulation Function: Runs the Markov chain over 20 time steps using matrix multiplication. Captures how user distribution changes with each step.
  • Plot Output:
    • X-axis: Time steps (weeks)
    • Y-axis: Proportion of users in each state
    • “Churned” steadily increases and eventually dominates
    • “Subscribed” stabilizes for a while, then declines
    • “Browsing” and “Paused” fluctuate early on, then drop

Also Read: Must-Know Data Visualization Tools for Data Scientists

Debugging and Common Pitfalls

Once you start experimenting with your own Markov chains, it’s easy to run into problems that don’t always throw errors but still give you weird results. Maybe your simulation never settles. Maybe your churned users somehow reappear. 

These issues usually come down to mistakes in the transition matrix or logic gaps in the setup. 

1. Rows Not Summing to 1

Your transition matrix must be row-stochastic. If any row doesn’t sum to 1, your model won’t represent valid probabilities. 

Quick check

np.allclose(transition_matrix.sum(axis=1), 1) 

2. Negative or Invalid Probabilities

Every element in the matrix must be between 0 and 1. Even a tiny negative value can cause unexpected behavior. 

Fix

assert np.all((transition_matrix >= 0) & (transition_matrix <= 1)) 

3. Periodic Chains That Don’t Converge

If your chain cycles between a few states (like A → B → A), it may never reach a stable distribution. 

Tip: Add a small chance of staying in the same state to break the cycle and make the chain aperiodic. 

4. Churned or Absorbing States Not Defined Properly

If your absorbing state (like “Churned”) isn’t truly absorbing, users might “escape” from it over time. 

Fix

# For a true absorbing state:
transition_matrix[3] = [0, 0, 0, 1]  # assuming 'Churned' is index 3

5. Mismatched Matrix and State Count

If the number of states doesn’t match the size of your matrix, transitions won’t map correctly. 

Check

assert transition_matrix.shape == (len(states), len(states))

6. Wrong Initial Distribution

If your initial state vector doesn’t sum to 1, your results will drift or inflate unnaturally. 

Fix

initial_state = initial_state / initial_state.sum()

Also Read: Python Challenges for Beginners

These checks are quick to run and can save hours of confusion. Add them before simulating anything complex.

As you apply this to your own projects, keep your data clean, validate your transition logic, and remember that Markov chains in Python offer probabilities, not certainties. 

Check out upGrad’s LL.M. in AI and Emerging Technologies (Blended Learning Program), where you'll explore the intersection of law, technology, and AI, including how probabilistic models are used in decision-making systems and digital policy. Start today! 

If you want to go further, look into computing steady-state distributions, modeling expected time to absorption, or exploring Hidden Markov Models and Markov Decision Processes for more complex systems. 

Advance Your Machine Learning Skills with upGrad!

Projects like modeling user churn or predicting subscription behavior offer unique learning experiences with Markov chains in Python. These models capture how systems shift between states over time, reflecting real-life behavior through a structured, probability-based approach. But applying Markov chains comes with its own set of challenges.

To truly excel, focus on designing accurate transition matrices and validating your assumptions with simulations and visual checks. For further growth, explore steady-state distributions, Hidden Markov Models, or combining Markov chains with machine learning for smarter predictions.

In addition to the courses mentioned above, here are some more free courses that can help you enhance your skills:  

Feeling uncertain about your next step? Get personalized career counseling to identify the best opportunities for you. Visit upGrad’s offline centers for expert mentorship, hands-on workshops, and networking sessions to connect you with industry leaders!

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Reference:
https://python.quantecon.org/finite_markov.html

Frequently Asked Questions (FAQs)

1. What kind of datasets work best when using Markov chains in Python?

2. Can I use markov chains in python for continuous data like temperature or stock prices?

3. How do I decide the right number of states for my Markov chain model?

4. What are the limitations of using markov chains in python for real-time prediction?

5. How do I validate the accuracy of a Markov chain model?

6. Can I use markov chains in python to forecast events like customer churn or product purchases?

7. What should I do if my transition matrix is sparse or incomplete?

8. Are markov chains in python scalable for large user datasets?

9. Can I combine Markov chains with other machine learning models?

10. How do I handle missing state transitions in my dataset when using markov chains in python?

11. What kind of visualizations are most useful when working with Markov chains in Python?

Rohit Sharma

763 articles published

Rohit Sharma shares insights, skill building advice, and practical tips tailored for professionals aiming to achieve their career goals.

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months