Excel with Python: Automation, Data Handling, and Advanced Techniques for 2025
By Rohit Sharma
Updated on Mar 19, 2025 | 18 min read | 8.0k views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Mar 19, 2025 | 18 min read | 8.0k views
Share:
Table of Contents
Excel with Python opens up powerful ways to automate tasks and manage data effortlessly. Working with Excel files in Python allows you to handle large datasets, perform complex calculations, and automate repetitive tasks.
Python makes it easy to work with Excel files and handle data in a much more organized and streamlined way. By the end, you’ll learn advanced techniques to save time and increase your productivity with Excel and Python.
Excel is a powerful tool for managing and analyzing data, but doing tasks manually can be time-consuming and prone to errors. This is where Python comes in. By using Python, you can automate repetitive tasks, manipulate data, and create reports with a few lines of code, making your work faster and more efficient.
Python makes it easy to work with Excel files and handle data in a much more organized and streamlined way. This section will show you how Python can take your Excel work to the next level by automating tasks and improving data efficiency.
Benefits of using Python for Excel tasks
Key libraries: pandas, openpyxl, xlsxwriter, xlrd, pyexcel
Python offers several libraries that make working with Excel files easy and powerful. Here are some of the most commonly used libraries:
Get started with upGrad’s machine learning courses to master essential libraries, automate tasks, manipulate data, and much more!
Also Read: Libraries in Python Explained: List of Important Libraries
Next, let’s set up Python for Excel and get you started.
Before diving into the world of Excel with Python, let’s get your environment set up. First, you’ll need a few libraries to work with Excel files in Python. The libraries we’ll use are pandas, openpyxl, and xlsxwriter, which will help you handle data and manipulate Excel files efficiently. Below is the process to install and use them.
Installing Required Libraries
To get started, you'll need to install the necessary libraries using pip. Here’s how you can do it:
pip install pandas openpyxl xlsxwriter
After installing libraries with pip install, verify them by running:
import pandas as pd
print(pd.__version__)
This helps troubleshoot version compatibility issues.
Importing Libraries in Python
Once the libraries are installed, you can start using them in your Python script. Here’s how to import the necessary libraries:
import pandas as pd # Import pandas for data manipulation
import openpyxl # Import openpyxl to read and write Excel files
import xlsxwriter # Import xlsxwriter to create new Excel files and add formatting
Loading an Excel File into Python
Now, let’s load an Excel file in Python and explore its contents using pandas. This example assumes you already have an Excel file named sample_data.xlsx.
The file contains a simple dataset with three columns:
Here’s the code:
# Load an Excel file into a pandas DataFrame
file_path = 'sample_data.xlsx' # Specify the file path
df = pd.read_excel(file_path) # Use pandas to read the Excel file into a DataFrame
# Display the first few rows of the file to check its content
print(df.head()) # This will display the first 5 rows of the Excel file
Output:
Let’s assume your sample_data.xlsx file contains a simple dataset like this:
Product |
Quantity |
Unit Price |
A | 10 | 5 |
B | 15 | 7 |
C | 12 | 6 |
After running the code above, the output would look like this:
Product Quantity Unit Price
0 A 10 5
1 B 15 7
2 C 12 6
Explanation:
Now that you're set up, let's dive into working with Excel files and making Python your go-to tool for data handling.
Now that your environment is set up, it’s time to get hands-on with Excel files in Python. Whether you're reading, writing, or modifying Excel files, Python makes these tasks straightforward and efficient.
Reading data from an Excel file in Python is a breeze with pandas. Here are the steps:
Using pandas.read_excel()
This method is the most commonly used to load data from an Excel file into a pandas DataFrame, which is ideal for analyzing and manipulating the data.
import pandas as pd # Import pandas library
# Read an Excel file into a DataFrame
file_path = 'sample_data.xlsx'
df = pd.read_excel(file_path) # Load data from the Excel file into pandas DataFrame
# Display the first few rows
print(df.head())
Also Read: A Comprehensive Guide to Pandas DataFrame astype()
Reading Specific Sheets
If your Excel file contains multiple sheets, you can specify the sheet you want to load.
# Read a specific sheet
df = pd.read_excel(file_path, sheet_name='Sheet2') # Specify the sheet name
# Display the first few rows of the specified sheet
print(df.head())
Handling Large Datasets Efficiently
When working with large datasets, you may want to load only a portion of the data. You can specify parameters like usecols or nrows to limit what’s loaded.
# Read only specific columns
df = pd.read_excel(file_path, usecols=['Product', 'Quantity']) # Only load 'Product' and 'Quantity' columns
# Read only a specific number of rows
df = pd.read_excel(file_path, nrows=10) # Load only the first 10 rows
# Display the results
print(df.head())
Output:
Assuming the Excel file contains sales data, the output will display something like:
Product Quantity
0 A 10
1 B 15
2 C 12
Also Read: What is Big Data? A Comprehensive Guide to Big Data and Big Data Analytics
Once you've worked with your data, you may want to save it back to an Excel file. Here’s how to do it:
Creating New Excel Files
# Create a new DataFrame
data = {'Product': ['A', 'B', 'C'], 'Quantity': [10, 15, 12], 'Unit Price': [5, 7, 6]}
df = pd.DataFrame(data) # Create a new DataFrame from a dictionary
# Write the DataFrame to a new Excel file
df.to_excel('new_sample_data.xlsx', index=False) # Save without including row indices
Also Read: Ultimate Guide to Work with Excel Spreadsheets Using Python
Writing DataFrames to Excel
If you already have a DataFrame and want to write it back to an existing Excel file, you can use the ExcelWriter function.
with pd.ExcelWriter('existing_data.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False) # Write DataFrame to a sheet
Formatting Cells with xlsxwriter
You can add advanced formatting when creating a new Excel file by using xlsxwriter.
import xlsxwriter # Import the xlsxwriter library
# Create a new workbook and add a worksheet
workbook = xlsxwriter.Workbook('formatted_data.xlsx')
worksheet = workbook.add_worksheet()
# Write some data
worksheet.write('A1', 'Product')
worksheet.write('B1', 'Quantity')
# Add some formatting
bold = workbook.add_format({'bold': True})
worksheet.write('A1', 'Product', bold) # Apply bold to the header
worksheet.write('B1', 'Quantity', bold)
# Close the workbook to save the file
workbook.close()
Sometimes you need to update an existing Excel file in Python. Here's how you can edit, add, and remove data:
Editing Existing Files
You can modify data directly within an existing file. Here’s how to change a value:
import openpyxl # Import openpyxl for modifying Excel files
# Load the existing Excel file
workbook = openpyxl.load_workbook('sample_data.xlsx')
sheet = workbook.active # Get the active sheet
# Modify a specific cell
sheet['A2'] = 'Updated Product' # Change the value in cell A2
# Save the changes
workbook.save('modified_data.xlsx')
Adding and Deleting Sheets
You can also add or remove sheets from an existing Excel file.
# Add a new sheet
workbook.create_sheet('NewSheet')
# Remove a sheet
workbook.remove(workbook['Sheet1']) # Removes 'Sheet1' from the workbook
# Save the changes
workbook.save('updated_data.xlsx')
Updating Specific Cell Values
If you need to update specific cells based on certain conditions, you can loop through the rows.
# Loop through rows and update based on conditions
for row in sheet.iter_rows(min_row=2, max_row=5, min_col=2, max_col=2): # Specify range
for cell in row:
if cell.value == 10: # If the quantity is 10
cell.value = 20 # Update the value to 20
# Save the updated file
workbook.save('updated_values.xlsx')
Output:
Assuming your file had Quantity = 10 in some cells, those would be updated to 20 after running the script.
Explanation:
Also Read: 60 Advanced Excel Formulas – A Must Know For All Professionals
Now that you know how to work with Excel files, let’s step up your game by automating tasks and diving into data analysis.
Automation helps you save time on manual tasks, while Python’s data analysis tools allow you to process large datasets quickly and efficiently.
Let’s break it down into key areas: Data Manipulation and Analysis, Visualizing Data in Excel, and Automating Excel Tasks.Data Manipulation and Analysis
Python makes it simple to manipulate and analyze data within Excel files. Here are some techniques for working with your Excel data:
Filtering and Sorting Data
You can filter and sort data in Excel using pandas, which offers a straightforward approach to handle large datasets.
import pandas as pd # Import pandas for data manipulation
# Read the Excel file into a DataFrame
file_path = 'sales_data.xlsx'
df = pd.read_excel(file_path)
# Filter data where Quantity > 10
filtered_df = df[df['Quantity'] > 10]
# Sort data by 'Product' column
sorted_df = df.sort_values(by='Product')
# Display the filtered and sorted DataFrame
print(filtered_df.head())
print(sorted_df.head())
Output:
Assuming your sales_data.xlsx file contains sales data, the filtered data would look like:
Product Quantity Unit Price
1 B 15 7
2 C 12 6
Applying Formulas in Excel Using Python
You can apply formulas directly to your Excel files in Python using pandas. Here’s how to calculate the total sales in an Excel file:
# Apply a formula to create a 'Total Sales' column
df['Total Sales'] = df['Quantity'] * df['Unit Price']
# Save the updated data with formulas
df.to_excel('sales_with_total.xlsx', index=False)
Aggregating Data (Sum, Average, Pivot Tables)
With pandas, you can easily aggregate data using functions like sum(), mean(), and pivot tables.
# Calculate the total sum of sales
total_sales = df['Total Sales'].sum()
# Calculate the average sales
average_sales = df['Total Sales'].mean()
# Create a pivot table
pivot_table = df.pivot_table(values='Total Sales', index='Product', aggfunc='sum')
# Print the results
print(total_sales)
print(average_sales)
print(pivot_table)
Output:
The pivot_table will aggregate total sales per product:
Product
A 50
B 105
C 72
Also Read: Top Python Automation Projects & Topics For Beginners
Python excels at data visualization, and you can embed charts directly into your Excel sheets using libraries like xlsxwriter and matplotlib.
Creating Charts and Graphs with Python
You can create various types of charts like bar charts, line graphs, and pie charts using matplotlib, and then embed them into Excel files with xlsxwriter.
import xlsxwriter # Import xlsxwriter for Excel file creation
import matplotlib.pyplot as plt # Import matplotlib for creating charts
# Create a plot using matplotlib
df.plot(kind='bar', x='Product', y='Total Sales')
# Save the plot as an image
plt.savefig('sales_chart.png')
# Create a new Excel file and add the plot as an image
workbook = xlsxwriter.Workbook('sales_report.xlsx')
worksheet = workbook.add_worksheet()
# Insert the chart into the worksheet
worksheet.insert_image('A1', 'sales_chart.png')
# Save the workbook
workbook.close()
Output:
The sales_report.xlsx file will contain a chart embedded within it.
Automation is where Python really shines. You can automate repetitive tasks like data entry, generating reports, and scheduling scripts to run at specific times. Below are some ways to automate tasks using Excel with Python.
Automating Repetitive Tasks
Python allows you to write scripts to automate data entry and generate reports. Here’s an example of automating the process of updating Excel files:
import pandas as pd
# Define a function to update Excel files with new data
def update_excel(file_path, new_data):
df = pd.read_excel(file_path)
updated_df = df.append(new_data, ignore_index=True) # Append new data to the existing file
updated_df.to_excel(file_path, index=False) # Save the updated data back to the file
# New data to add
new_data = {'Product': ['D'], 'Quantity': [20], 'Unit Price': [8], 'Total Sales': [160]}
# Update the Excel file
update_excel('sales_data.xlsx', new_data)
Using Python Scripts with Excel Macros (pywin32)
For more advanced automation, you can integrate Python with Excel’s built-in macros using the pywin32 library.
import win32com.client # Import pywin32 to interact with Excel
# Open Excel using pywin32
excel = win32com.client.Dispatch("Excel.Application")
workbook = excel.Workbooks.Open('sales_data.xlsx')
# Run a macro (if you have one set up in your Excel file)
excel.Application.Run("MyMacro")
# Save and close the workbook
workbook.Save()
workbook.Close()
Scheduling Automation with Task Schedulers
To automate scripts at regular intervals, you can schedule your Python scripts using task schedulers like Windows Task Scheduler or cron on macOS/Linux. This allows you to automate tasks like generating daily reports or running data updates.
Explanation:
Ready to take your Python and Excel skills to the next level? Let’s dive into some advanced techniques and real-world applications.
In this section, you’ll look at advanced techniques that can make your workflow even more powerful and efficient.
Additionally, you’ll explore how to export and convert Excel files in Python and how to handle common errors that may arise during automation.
Python allows you to easily convert Excel files in Python to other formats such as CSV, JSON, or even databases. This is incredibly useful when you need to share data with other systems or processes. You can also export your charts and reports as PDFs to share insights in a more professional format.
Converting Excel to CSV, JSON, or Databases
Often, you’ll need to export your Excel data to other formats for further analysis or integration with other systems. Python makes this easy using pandas.
import pandas as pd # Import pandas for data manipulation
# Read an Excel file
df = pd.read_excel('sales_data.xlsx')
# Convert the Excel data to CSV
df.to_csv('sales_data.csv', index=False) # Export to CSV
# Convert the Excel data to JSON
df.to_json('sales_data.json', orient='records') # Export to JSON
# Save data to a database (example with SQLite)
import sqlite3
conn = sqlite3.connect('sales_data.db')
df.to_sql('sales', conn, if_exists='replace', index=False) # Export to SQLite database
conn.close()
Also Read: How to Open json File in Excel
Exporting Charts and Reports as PDFs
You can also export your Excel charts in Python as PDFs, which can be useful for generating reports or presentations.
import matplotlib.pyplot as plt # Import matplotlib for plotting
# Create a simple plot (bar chart)
df.plot(kind='bar', x='Product', y='Total Sales')
plt.title('Total Sales by Product')
# Save the plot as a PDF file
plt.savefig('sales_report.pdf', format='pdf') # Save as PDF
When automating tasks in Excel with Python, errors are bound to occur, especially when dealing with large datasets or complex operations. Let’s look at common issues you might face and tips for debugging.
Common Issues in Excel Automation
Debugging Tips for Handling Large Datasets
When working with large datasets in Excel files in Python, here are a few tips to avoid issues and debugging:
Read Data in Chunks: Instead of loading the entire dataset into memory, read it in smaller chunks using the chunksize parameter in pandas.
chunksize = 10000 # Number of rows per chunk
for chunk in pd.read_excel('large_sales_data.xlsx', chunksize=chunksize):
process_chunk(chunk) # Process each chunk separately
Optimize Data Types: You can save memory by explicitly defining the data types of columns when reading the data.
df = pd.read_excel('sales_data.xlsx', dtype={'Product': 'category', 'Quantity': 'int32'})
Use Try-Except for Error Handling: When automating Excel tasks, use try-except blocks to handle errors gracefully.
try:
df = pd.read_excel('sales_data.xlsx')
except FileNotFoundError:
print("The Excel file was not found. Please check the file path.")
In this section, you’ll walk through a real-world example of automating an Excel report using Python. You’ll focus on automating the creation of a financial report, which is a common task in financial modeling and reporting.
Let’s say you need to generate a monthly financial report that includes revenue, expenses, and profit. You need to read data from an Excel file, calculate totals and profit margins, and then export the results into a neatly formatted report.
Step-by-Step Implementation
1. Prepare the Excel Data:
Assume that the Excel file contains sales data, including columns for product names, sales, and costs.
Here's how the data might look in financial_data.xlsx:
Product |
Sales |
Cost |
A | 10000 | 5000 |
B | 15000 | 7000 |
C | 12000 | 6000 |
2. Read the Data from Excel:
First, you will read the data from the Excel file using pandas.
import pandas as pd # Import pandas for data manipulation
# Read the Excel file into a DataFrame
file_path = 'financial_data.xlsx'
df = pd.read_excel(file_path)
# Display the first few rows to check the data
print(df.head())
3. Calculate Financial Metrics:
You will now calculate total revenue, total cost, and profit for each product. Then, calculate profit margin as (Profit / Sales) * 100.
# Calculate total revenue, total cost, and profit for each product
df['Profit'] = df['Sales'] - df['Cost'] # Profit = Sales - Cost
# Calculate profit margin (as a percentage)
df['Profit Margin'] = (df['Profit'] / df['Sales']) * 100
# Calculate total revenue, total cost, and total profit
total_revenue = df['Sales'].sum()
total_cost = df['Cost'].sum()
total_profit = df['Profit'].sum()
# Print the updated DataFrame and total metrics
print(df)
print(f"Total Revenue: {total_revenue}")
print(f"Total Cost: {total_cost}")
print(f"Total Profit: {total_profit}")
Output:
After running the code above, your DataFrame might look like this:
Product |
Sales |
Cost |
Profit |
Profit Margin |
A | 10000 | 5000 | 5000 | 50.0 |
B | 15000 | 7000 | 8000 | 53.33 |
C | 12000 | 6000 | 6000 | 50.0 |
Additionally, the total metrics would be:
Total Revenue: 37000
Total Cost: 18000
Total Profit: 19000
4. Create a Financial Summary Report:
Now, let’s create a summary report by writing the calculations and data back to a new Excel file. You can also add a formatted summary using xlsxwriter.
import xlsxwriter # Import xlsxwriter for Excel file creation
# Create a new Excel workbook and add a worksheet
workbook = xlsxwriter.Workbook('financial_report.xlsx')
worksheet = workbook.add_worksheet()
# Write the original DataFrame to the new worksheet
df.to_excel(workbook, sheet_name='Financial Data', index=False)
# Create a summary section at the bottom
worksheet.write('A6', 'Total Revenue')
worksheet.write('B6', total_revenue)
worksheet.write('A7', 'Total Cost')
worksheet.write('B7', total_cost)
worksheet.write('A8', 'Total Profit')
worksheet.write('B8', total_profit)
# Format the summary section (bold the header)
bold = workbook.add_format({'bold': True})
worksheet.write('A5', 'Summary', bold)
# Save the workbook
workbook.close()
Output:
The file financial_report.xlsx will contain:
This technique can be applied to various financial analysis tasks, such as budgeting, forecasting, and profit analysis, making your workflow more efficient and streamlined.
With a global network of over 10 million learners, upGrad provides access to a wide range of industry-focused courses designed for both beginners and seasoned professionals.
Here are some of the top recommended courses:
You can also get personalized career counseling with upGrad to guide your career path, or visit your nearest upGrad center and start hands-on training today!
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources