View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Excel with Python: Automation, Data Handling, and Advanced Techniques for 2025

By Rohit Sharma

Updated on Mar 19, 2025 | 18 min read | 8.0k views

Share:

Excel with Python opens up powerful ways to automate tasks and manage data effortlessly. Working with Excel files in Python allows you to handle large datasets, perform complex calculations, and automate repetitive tasks.

Python makes it easy to work with Excel files and handle data in a much more organized and streamlined way. By the end, you’ll learn advanced techniques to save time and increase your productivity with Excel and Python.

Excel with Python: Introduction to Data Efficiency and Automation

Excel is a powerful tool for managing and analyzing data, but doing tasks manually can be time-consuming and prone to errors. This is where Python comes in. By using Python, you can automate repetitive tasks, manipulate data, and create reports with a few lines of code, making your work faster and more efficient.

Python makes it easy to work with Excel files and handle data in a much more organized and streamlined way. This section will show you how Python can take your Excel work to the next level by automating tasks and improving data efficiency.

Benefits of using Python for Excel tasks

  • Automation: Eliminate manual work by automating repetitive Excel tasks such as data entry, formatting, or generating reports.
  • Data Manipulation: Python enables you to clean, transform, and analyze data in Excel files with ease.
  • Advanced Functions: Python can help you perform complex calculations or analysis, such as pivot tables or custom formulas, that might be difficult or time-consuming in Excel.
  • Scalability: Python is highly scalable, so it’s great for working with large datasets or automating tasks across multiple files.

Key libraries: pandas, openpyxl, xlsxwriter, xlrd, pyexcel

Python offers several libraries that make working with Excel files easy and powerful. Here are some of the most commonly used libraries:

  • pandas: Best for data analysis and manipulation. It's commonly used to load, process, and export data in Excel files.
  • openpyxl: Useful for reading and writing Excel files (.xlsx). You can modify existing files, add new sheets, or change cell values.
  • xlsxwriter: Great for creating new Excel files with advanced formatting and charts.
  • xlrd: Older library, now primarily used for reading Excel files (.xls). Replace xlrd with openpyxl for .xlsx files, as xlrd no longer supports them since late 2020.
  • pyexcel: A lightweight option for reading and writing Excel files quickly and easily.

Get started with upGrad’s machine learning courses to master essential libraries, automate tasks, manipulate data, and much more! 

Also Read: Libraries in Python Explained: List of Important Libraries

Next, let’s set up Python for Excel and get you started.

How Do You Set Up Python for Excel Work?

Before diving into the world of Excel with Python, let’s get your environment set up. First, you’ll need a few libraries to work with Excel files in Python. The libraries we’ll use are pandas, openpyxl, and xlsxwriter, which will help you handle data and manipulate Excel files efficiently. Below is the process to install and use them.

Installing Required Libraries

To get started, you'll need to install the necessary libraries using pip. Here’s how you can do it:

  1. Open your terminal or command prompt.
  2. Type the following command to install the required libraries: 
pip install pandas openpyxl xlsxwriter
  • pandas: Used for data analysis and manipulation.
  • openpyxl: Enables Python to read and write Excel files (.xlsx).
  • xlsxwriter: Useful for creating new Excel files and applying advanced formatting.

After installing libraries with pip install, verify them by running:

import pandas as pd  

print(pd.__version__)  

This helps troubleshoot version compatibility issues.

Importing Libraries in Python

Once the libraries are installed, you can start using them in your Python script. Here’s how to import the necessary libraries: 

import pandas as pd  # Import pandas for data manipulation
import openpyxl  # Import openpyxl to read and write Excel files
import xlsxwriter  # Import xlsxwriter to create new Excel files and add formatting

Loading an Excel File into Python

Now, let’s load an Excel file in Python and explore its contents using pandas. This example assumes you already have an Excel file named sample_data.xlsx. 

The file contains a simple dataset with three columns:

  • Product: The name of the product
  • Quantity: The number of units available
  • Unit Price: The price per unit

Here’s the code: 

# Load an Excel file into a pandas DataFrame
file_path = 'sample_data.xlsx'  # Specify the file path
df = pd.read_excel(file_path)  # Use pandas to read the Excel file into a DataFrame

# Display the first few rows of the file to check its content
print(df.head())  # This will display the first 5 rows of the Excel file

Output:

Let’s assume your sample_data.xlsx file contains a simple dataset like this:

Product

Quantity

Unit Price

A 10 5
B 15 7
C 12 6

After running the code above, the output would look like this: 

 Product  Quantity  Unit Price
0       A         10           5
1       B         15           7
2       C         12           6

Explanation:

  • pip install pandas openpyxl xlsxwriter: This installs the libraries needed to read, write, and manipulate Excel files in Python.
  • import pandas as pd: This imports pandas for data manipulation and analysis. It's the go-to library for working with datasets in Python.
  • import openpyxl: Openpyxl is used for reading and writing Excel files in Python. It works with .xlsx files.
  • import xlsxwriter: This library is mainly used for creating new Excel files and adding advanced formatting like charts or cell styles.
  • pd.read_excel(file_path): This function reads the Excel file from the specified path and loads it into a pandas DataFrame.
  • df.head(): Displays the first 5 rows of the data so you can quickly inspect the contents of your Excel file.

Now that you're set up, let's dive into working with Excel files and making Python your go-to tool for data handling.

How Do You Work with Excel Files in Python?

Now that your environment is set up, it’s time to get hands-on with Excel files in Python. Whether you're reading, writing, or modifying Excel files, Python makes these tasks straightforward and efficient.  

Reading Excel Files

Reading data from an Excel file in Python is a breeze with pandas. Here are the steps:

Using pandas.read_excel()

This method is the most commonly used to load data from an Excel file into a pandas DataFrame, which is ideal for analyzing and manipulating the data. 

import pandas as pd  # Import pandas library

# Read an Excel file into a DataFrame
file_path = 'sample_data.xlsx'
df = pd.read_excel(file_path)  # Load data from the Excel file into pandas DataFrame

# Display the first few rows
print(df.head())

Also Read: A Comprehensive Guide to Pandas DataFrame astype()

Reading Specific Sheets

If your Excel file contains multiple sheets, you can specify the sheet you want to load. 

# Read a specific sheet
df = pd.read_excel(file_path, sheet_name='Sheet2')  # Specify the sheet name

# Display the first few rows of the specified sheet
print(df.head())

Handling Large Datasets Efficiently

When working with large datasets, you may want to load only a portion of the data. You can specify parameters like usecols or nrows to limit what’s loaded. 

# Read only specific columns
df = pd.read_excel(file_path, usecols=['Product', 'Quantity'])  # Only load 'Product' and 'Quantity' columns

# Read only a specific number of rows
df = pd.read_excel(file_path, nrows=10)  # Load only the first 10 rows

# Display the results
print(df.head())

Output:

Assuming the Excel file contains sales data, the output will display something like: 

 Product  Quantity
0       A         10
1       B         15
2       C         12

Also Read: What is Big Data? A Comprehensive Guide to Big Data and Big Data Analytics

Writing Data to Excel

Once you've worked with your data, you may want to save it back to an Excel file. Here’s how to do it:

Creating New Excel Files 

# Create a new DataFrame
data = {'Product': ['A', 'B', 'C'], 'Quantity': [10, 15, 12], 'Unit Price': [5, 7, 6]}
df = pd.DataFrame(data)  # Create a new DataFrame from a dictionary

# Write the DataFrame to a new Excel file
df.to_excel('new_sample_data.xlsx', index=False)  # Save without including row indices

Also Read: Ultimate Guide to Work with Excel Spreadsheets Using Python

Writing DataFrames to Excel

If you already have a DataFrame and want to write it back to an existing Excel file, you can use the ExcelWriter function. 

with pd.ExcelWriter('existing_data.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer, sheet_name='Sheet1', index=False)  # Write DataFrame to a sheet

Formatting Cells with xlsxwriter

You can add advanced formatting when creating a new Excel file by using xlsxwriter. 

import xlsxwriter  # Import the xlsxwriter library

# Create a new workbook and add a worksheet
workbook = xlsxwriter.Workbook('formatted_data.xlsx')
worksheet = workbook.add_worksheet()

# Write some data
worksheet.write('A1', 'Product')
worksheet.write('B1', 'Quantity')

# Add some formatting
bold = workbook.add_format({'bold': True})
worksheet.write('A1', 'Product', bold)  # Apply bold to the header
worksheet.write('B1', 'Quantity', bold)

# Close the workbook to save the file
workbook.close()

Modifying Excel Files

Sometimes you need to update an existing Excel file in Python. Here's how you can edit, add, and remove data:

Editing Existing Files

You can modify data directly within an existing file. Here’s how to change a value: 

import openpyxl  # Import openpyxl for modifying Excel files

# Load the existing Excel file
workbook = openpyxl.load_workbook('sample_data.xlsx')
sheet = workbook.active  # Get the active sheet

# Modify a specific cell
sheet['A2'] = 'Updated Product'  # Change the value in cell A2

# Save the changes
workbook.save('modified_data.xlsx')

Adding and Deleting Sheets

You can also add or remove sheets from an existing Excel file. 

# Add a new sheet
workbook.create_sheet('NewSheet')

# Remove a sheet
workbook.remove(workbook['Sheet1'])  # Removes 'Sheet1' from the workbook

# Save the changes
workbook.save('updated_data.xlsx')

Updating Specific Cell Values

If you need to update specific cells based on certain conditions, you can loop through the rows. 

# Loop through rows and update based on conditions
for row in sheet.iter_rows(min_row=2, max_row=5, min_col=2, max_col=2):  # Specify range
    for cell in row:
        if cell.value == 10:  # If the quantity is 10
            cell.value = 20  # Update the value to 20

# Save the updated file
workbook.save('updated_values.xlsx')

Output:

Assuming your file had Quantity = 10 in some cells, those would be updated to 20 after running the script.

Explanation:

  • pandas.read_excel(): This function reads the contents of an Excel file and stores it in a pandas DataFrame, which is easier to manipulate.
  • sheet_name: Allows you to specify which sheet to read if the Excel file contains multiple sheets.
  • usecols and nrows: These parameters help load specific columns or a limited number of rows to save memory when working with large files.
  • df.to_excel(): This saves a DataFrame to a new or existing Excel file.
  • xlsxwriter: This library helps you format the output Excel file, such as adding bold text, setting colors, or applying other styles.
  • openpyxl: Used for modifying existing Excel files, allowing you to edit data, add or delete sheets, and change cell values.

Also Read: 60 Advanced Excel Formulas – A Must Know For All Professionals

Now that you know how to work with Excel files, let’s step up your game by automating tasks and diving into data analysis.

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree18 Months

Placement Assistance

Certification8-8.5 Months

How Can You Automate Excel Tasks and Analyze Data?

Automation helps you save time on manual tasks, while Python’s data analysis tools allow you to process large datasets quickly and efficiently.

Let’s break it down into key areas: Data Manipulation and Analysis, Visualizing Data in Excel, and Automating Excel Tasks.Data Manipulation and Analysis

Python makes it simple to manipulate and analyze data within Excel files. Here are some techniques for working with your Excel data:

Filtering and Sorting Data

You can filter and sort data in Excel using pandas, which offers a straightforward approach to handle large datasets. 

import pandas as pd  # Import pandas for data manipulation

# Read the Excel file into a DataFrame
file_path = 'sales_data.xlsx'
df = pd.read_excel(file_path)

# Filter data where Quantity > 10
filtered_df = df[df['Quantity'] > 10]

# Sort data by 'Product' column
sorted_df = df.sort_values(by='Product')

# Display the filtered and sorted DataFrame
print(filtered_df.head())
print(sorted_df.head())

Output:

Assuming your sales_data.xlsx file contains sales data, the filtered data would look like: 

 Product  Quantity  Unit Price
1       B         15           7
2       C         12           6

Applying Formulas in Excel Using Python

You can apply formulas directly to your Excel files in Python using pandas. Here’s how to calculate the total sales in an Excel file: 

# Apply a formula to create a 'Total Sales' column
df['Total Sales'] = df['Quantity'] * df['Unit Price']

# Save the updated data with formulas
df.to_excel('sales_with_total.xlsx', index=False)

Aggregating Data (Sum, Average, Pivot Tables)

With pandas, you can easily aggregate data using functions like sum()mean(), and pivot tables

# Calculate the total sum of sales
total_sales = df['Total Sales'].sum()

# Calculate the average sales
average_sales = df['Total Sales'].mean()

# Create a pivot table
pivot_table = df.pivot_table(values='Total Sales', index='Product', aggfunc='sum')

# Print the results
print(total_sales)
print(average_sales)
print(pivot_table)

Output:

The pivot_table will aggregate total sales per product: 

Product
A     50
B    105
C     72

Also Read: Top Python Automation Projects & Topics For Beginners

Visualizing Data in Excel

Python excels at data visualization, and you can embed charts directly into your Excel sheets using libraries like xlsxwriter and matplotlib.

Creating Charts and Graphs with Python

You can create various types of charts like bar charts, line graphs, and pie charts using matplotlib, and then embed them into Excel files with xlsxwriter. 

import xlsxwriter  # Import xlsxwriter for Excel file creation
import matplotlib.pyplot as plt  # Import matplotlib for creating charts

# Create a plot using matplotlib
df.plot(kind='bar', x='Product', y='Total Sales')

# Save the plot as an image
plt.savefig('sales_chart.png')

# Create a new Excel file and add the plot as an image
workbook = xlsxwriter.Workbook('sales_report.xlsx')
worksheet = workbook.add_worksheet()

# Insert the chart into the worksheet
worksheet.insert_image('A1', 'sales_chart.png')

# Save the workbook
workbook.close()

Output:

The sales_report.xlsx file will contain a chart embedded within it.

Automating Excel Tasks

Automation is where Python really shines. You can automate repetitive tasks like data entry, generating reports, and scheduling scripts to run at specific times. Below are some ways to automate tasks using Excel with Python.

Automating Repetitive Tasks

Python allows you to write scripts to automate data entry and generate reports. Here’s an example of automating the process of updating Excel files: 

import pandas as pd

# Define a function to update Excel files with new data
def update_excel(file_path, new_data):
    df = pd.read_excel(file_path)
    updated_df = df.append(new_data, ignore_index=True)  # Append new data to the existing file
    updated_df.to_excel(file_path, index=False)  # Save the updated data back to the file

# New data to add
new_data = {'Product': ['D'], 'Quantity': [20], 'Unit Price': [8], 'Total Sales': [160]}

# Update the Excel file
update_excel('sales_data.xlsx', new_data)

Using Python Scripts with Excel Macros (pywin32)

For more advanced automation, you can integrate Python with Excel’s built-in macros using the pywin32 library. 

import win32com.client  # Import pywin32 to interact with Excel

# Open Excel using pywin32
excel = win32com.client.Dispatch("Excel.Application")
workbook = excel.Workbooks.Open('sales_data.xlsx')

# Run a macro (if you have one set up in your Excel file)
excel.Application.Run("MyMacro")

# Save and close the workbook
workbook.Save()
workbook.Close()

Scheduling Automation with Task Schedulers

To automate scripts at regular intervals, you can schedule your Python scripts using task schedulers like Windows Task Scheduler or cron on macOS/Linux. This allows you to automate tasks like generating daily reports or running data updates.

Explanation:

  • pandas: Used for data manipulation and analysis, allowing you to filter, sort, and aggregate data easily.
  • matplotlib: A powerful library for visualizing data. It helps create charts like bar graphs, line charts, and scatter plots.
  • xlsxwriter: This library helps you insert charts and images into Excel files, enhancing data visualization.
  • pywin32: Integrates Python with Excel’s built-in features, such as macros, enabling you to automate tasks directly in Excel.
  • Task schedulers: Automate Python scripts to run at specific times, improving efficiency in routine tasks.

Kickstart your data analysis journey with our free Introduction to Data Analysis using Excel course! Learn to leverage Excel with Python and powerful libraries like pandas and openpyxl to analyze data efficiently. 

Ready to take your Python and Excel skills to the next level? Let’s dive into some advanced techniques and real-world applications.

Advanced Excel with Python Techniques and Applications

In this section, you’ll look at advanced techniques that can make your workflow even more powerful and efficient. 

Additionally, you’ll explore how to export and convert Excel files in Python and how to handle common errors that may arise during automation.

Exporting and Converting Excel Files

Python allows you to easily convert Excel files in Python to other formats such as CSV, JSON, or even databases. This is incredibly useful when you need to share data with other systems or processes. You can also export your charts and reports as PDFs to share insights in a more professional format.

Converting Excel to CSV, JSON, or Databases

Often, you’ll need to export your Excel data to other formats for further analysis or integration with other systems. Python makes this easy using pandas. 

import pandas as pd  # Import pandas for data manipulation

# Read an Excel file
df = pd.read_excel('sales_data.xlsx')

# Convert the Excel data to CSV
df.to_csv('sales_data.csv', index=False)  # Export to CSV

# Convert the Excel data to JSON
df.to_json('sales_data.json', orient='records')  # Export to JSON

# Save data to a database (example with SQLite)
import sqlite3
conn = sqlite3.connect('sales_data.db')
df.to_sql('sales', conn, if_exists='replace', index=False)  # Export to SQLite database
conn.close()

Also Read: How to Open json File in Excel

Exporting Charts and Reports as PDFs

You can also export your Excel charts in Python as PDFs, which can be useful for generating reports or presentations. 

import matplotlib.pyplot as plt  # Import matplotlib for plotting

# Create a simple plot (bar chart)
df.plot(kind='bar', x='Product', y='Total Sales')
plt.title('Total Sales by Product')

# Save the plot as a PDF file
plt.savefig('sales_report.pdf', format='pdf')  # Save as PDF

Handling Errors and Debugging

When automating tasks in Excel with Python, errors are bound to occur, especially when dealing with large datasets or complex operations. Let’s look at common issues you might face and tips for debugging.

Common Issues in Excel Automation

  1. Missing Files or Incorrect File Paths: If the path to the Excel file is wrong, the program will fail to load the file.
  2. Large Datasets: Handling large Excel files may slow down your process, causing memory issues or errors.
  3. Data Format Issues: Sometimes, Excel files contain unexpected data types, such as text in a numeric column, which can lead to errors.

Debugging Tips for Handling Large Datasets

When working with large datasets in Excel files in Python, here are a few tips to avoid issues and debugging:

Read Data in Chunks: Instead of loading the entire dataset into memory, read it in smaller chunks using the chunksize parameter in pandas.

chunksize = 10000  # Number of rows per chunk
for chunk in pd.read_excel('large_sales_data.xlsx', chunksize=chunksize):
    process_chunk(chunk)  # Process each chunk separately

Optimize Data Types: You can save memory by explicitly defining the data types of columns when reading the data.

df = pd.read_excel('sales_data.xlsx', dtype={'Product': 'category', 'Quantity': 'int32'})

Use Try-Except for Error Handling: When automating Excel tasks, use try-except blocks to handle errors gracefully.

try:
    df = pd.read_excel('sales_data.xlsx')
except FileNotFoundError:
    print("The Excel file was not found. Please check the file path.")

Real-World Project: Automating an Excel Report

In this section, you’ll walk through a real-world example of automating an Excel report using Python. You’ll focus on automating the creation of a financial report, which is a common task in financial modeling and reporting.   

Let’s say you need to generate a monthly financial report that includes revenue, expenses, and profit. You need to read data from an Excel file, calculate totals and profit margins, and then export the results into a neatly formatted report.

Step-by-Step Implementation

1. Prepare the Excel Data

Assume that the Excel file contains sales data, including columns for product names, sales, and costs.
Here's how the data might look in financial_data.xlsx:

Product

Sales

Cost

A 10000 5000
B 15000 7000
C 12000 6000

2. Read the Data from Excel

First, you will read the data from the Excel file using pandas. 

import pandas as pd  # Import pandas for data manipulation

# Read the Excel file into a DataFrame
file_path = 'financial_data.xlsx'
df = pd.read_excel(file_path)

# Display the first few rows to check the data
print(df.head())

3. Calculate Financial Metrics

You will now calculate total revenue, total cost, and profit for each product. Then, calculate profit margin as (Profit / Sales) * 100

# Calculate total revenue, total cost, and profit for each product
df['Profit'] = df['Sales'] - df['Cost']  # Profit = Sales - Cost

# Calculate profit margin (as a percentage)
df['Profit Margin'] = (df['Profit'] / df['Sales']) * 100

# Calculate total revenue, total cost, and total profit
total_revenue = df['Sales'].sum()
total_cost = df['Cost'].sum()
total_profit = df['Profit'].sum()

# Print the updated DataFrame and total metrics
print(df)
print(f"Total Revenue: {total_revenue}")
print(f"Total Cost: {total_cost}")
print(f"Total Profit: {total_profit}")

Output:

After running the code above, your DataFrame might look like this:

Product

Sales

Cost

Profit

Profit Margin

A 10000 5000 5000 50.0
B 15000 7000 8000 53.33
C 12000 6000 6000 50.0

Additionally, the total metrics would be: 

Total Revenue: 37000
Total Cost: 18000
Total Profit: 19000

4. Create a Financial Summary Report

Now, let’s create a summary report by writing the calculations and data back to a new Excel file. You can also add a formatted summary using xlsxwriter. 

import xlsxwriter  # Import xlsxwriter for Excel file creation

# Create a new Excel workbook and add a worksheet
workbook = xlsxwriter.Workbook('financial_report.xlsx')
worksheet = workbook.add_worksheet()

# Write the original DataFrame to the new worksheet
df.to_excel(workbook, sheet_name='Financial Data', index=False)

# Create a summary section at the bottom
worksheet.write('A6', 'Total Revenue')
worksheet.write('B6', total_revenue)
worksheet.write('A7', 'Total Cost')
worksheet.write('B7', total_cost)
worksheet.write('A8', 'Total Profit')
worksheet.write('B8', total_profit)

# Format the summary section (bold the header)
bold = workbook.add_format({'bold': True})
worksheet.write('A5', 'Summary', bold)

# Save the workbook
workbook.close()

Output:

The file financial_report.xlsx will contain:

  1. A worksheet with the original data, including calculations for Profit and Profit Margin.
  2. A summary section with the total revenue, cost, and profit at the bottom.

This technique can be applied to various financial analysis tasks, such as budgeting, forecasting, and profit analysis, making your workflow more efficient and streamlined.

How Can upGrad Support Your Learning?

With a global network of over 10 million learners, upGrad provides access to a wide range of industry-focused courses designed for both beginners and seasoned professionals. 

Here are some of the top recommended courses:

You can also get personalized career counseling with upGrad to guide your career path, or visit your nearest upGrad center and start hands-on training today!

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Frequently Asked Questions

1. How can I handle large Excel files in Python efficiently?

2. What libraries are best for automating Excel tasks in Python?

3. Can I automate report generation with Python for Excel?

4. How can I integrate Python with Excel macros for automation?

5. Can I create Excel charts and graphs using Python?

6. How do I filter data in Excel files using Python?

7. Is it possible to convert Excel data to other formats using Python?

8. How can I automate financial calculations in Excel with Python?

9. How do I handle errors when automating Excel tasks in Python?

10. Can I use Python to update existing Excel files automatically?

11. How can I schedule Python scripts for automating Excel tasks?

Rohit Sharma

711 articles published

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

18 Months

upGrad Logo

Certification

3 Months