Data Frames in Python: Python In-depth Tutorial
Updated on Jun 16, 2023 | 15 min read | 7.7k views
Share:
For working professionals
For fresh graduates
More
Updated on Jun 16, 2023 | 15 min read | 7.7k views
Share:
Table of Contents
If you are a developer or coder who works in the Python programming language, you must be familiar with one of the most amazing data management libraries out there – Pandas, one of the top python libraries out there. Over the years, Pandas has emerged into a standard tool for data analysis and management using Python. Read about other important Python tools.
Pandas is undoubtedly the most versatile Python package for data science and rightly so. It provides powerful, expressive, and flexible data structures for easy data manipulation and analysis, and Data Frames in Python is one of these structures.
This is precisely our topics of discussion in this post – we’ll introduce you to the basic data format for Pandas, that is, the Pandas Data Frame.
Check out our data science online courses to upskill yourself
According to the Pandas library documentation, a Data Frame is a “two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labelled axes (rows and columns)”. In simple words, a Data Frame is a data structure wherein data is aligned in a tabular fashion, that is, in rows and columns.
A Data Frame usually has the following characteristics:
In a Pandas Data Frame, you can also specify the index and column names for your Data Frame. While the index indicates the difference in rows, the column names show the difference in columns.
Creating a Data Frame is the first step for data munging in Python. You can create a Pandas Data Frame using inputs like:
1. Creating an Empty Data Frame
It is quite easy to create a basic Data Frame, a.k.a., an Empty Data Frame. Here’s an example:
Input –
Output –
2. Creating a Data Frame from Lists
You can create a Data Frame either using a single list or multiple lists.
Input –
Output –
3. Creating a Data Frame from Dict of “ndarrays” or Lists
To create a Data Frame from a dict of ndarrays, all the ndarrays must be of the same length. Also, if it is indexed, the length of the index should be equal to the length of the arrays. However, if it isn’t indexed, the index will be range(n) by default, where ‘n’ denotes the array length.
Input –
Output –
Here the values 0,1,2,3 are the default index assigned to each row using the function range(n).
In Python DataFrame or PD data frame, creating a facts frame is easy. A dictionary is one of the best ways to make a facts frame. Use the pd.DataFrame() characteristic to create an information frame from a dictionary. A listing of dictionaries also can be used to shape an information frame. In this example, each row inside the records frame represents a dictionary in the list.
A vital part of records manipulation is adding and casting off columns and rows from a records frame and using the indexing operator or the.Assign() technique; a brand new column may be brought to a facts frame. The.Drop() approach or the del command can each be used to dispose of a column from a data frame. You can add or dispose of rows from a data body using the—append () and.Drop() operations.
Data analysis includes the critical step of selecting data from a data frame. There are many strategies to pick out records from a statistics body, which include indexing, slicing, and boolean indexing. Using the indexing operator, you can choose an unmarried element from a data frame or a part of an item. You opt for Boolean indexing to choose data based on a specific circumstance.
CSV, Excel, and SQL databases are just a few file sorts from which statistics frames may study records. Use the pd.Read_csv(), pd.Read_excel(), and pd.Read_sql() techniques to analyse information from these document codecs into a statistics frame.
You may do basic statistical operations on your data using data frames, including mean, median, mode, standard deviation, and correlation. To conduct these actions on a data frame, utilise the. mean(),.median(),.mode(),.std(), and.corr() methods.
Merging, combining, and concatenating numerous data frames in Python is a frequent data analysis procedure. To combine numerous data frames into one, utilise the. merge().join(), and.concat() methods.
The pd.DataFrame() method allows you to start from scratch when creating a data frame. You can provide the column names and data types using the columns and dtype arguments.
Applying several operations to the body of a record, along with filtering, sorting, grouping, and aggregating, is critical for data analysis. To conduct these operations on a data frame, utilise the. filter(),.sort_values(),.groupby(), and. agg() methods.
Now that we’ve seen three ways to create Data Frames in Python, it’s time to learn about the different operations within a Data Frame.
1. Selecting an index or column from a Pandas Data Frame
It is important to know how to select an index or column before can start adding, deleting, and renaming the components within a DataFrame. Suppose this is your Data Frame:
You want to access the value under index 0 in column ‘A’ – the value is 1. There are many ways to access this value, but two of the most important ones are – .loc[] and .iloc[].
Input –
Output –
So, as you can see, you can access values either by calling them by their label or by declaring their position in the index or column. While this was selecting a value from a Data Frame, how can you select rows and columns from the same?
This is how:
Input –
Output-
2. How To Add an Index, Row, or Column to a Pandas DataFrame
Once you learn how to access values and select columns from a Data Frame, you can learn to add index, row, or column in a Pandas Data Frame.
Adding an Index:
While creating a Data Frame, you can choose to add an input to the ‘index’ argument. This ensures that you can easily access the index you desire. If you don’t specify the index, by default, a numerically valued index that starts with 0 and continues till the last row of the DataFrame will be added to it. Although, even after the index is specified by default, you can use a column and convert it into an index by calling the set_index() function in the Data Frame.
Adding a Row:
You can add rows to a DataFrame using the append function.
Input –
Output –
You can also use .loc to insert rows in your DataFrame like so:
Input –
Output –
Adding a column
If you want to make an index the part of a Data Frame, you can take a column from the Data Frame or refer to a column that hasn’t been created yet, and assign it to the .index property like this:
Input –
Output –
For adding columns to a Data Frame, you can also use the same approach that you would use for adding an index to the Data Frame, that is, you can use the .loc[ ] or .iloc[ ] function. For example:
Input –
Output
With .loc[ ], you can add a Series to an existing DataFrame. Since a Series object is quite similar to a column of a Data Frame, it is very easy to add a Series to an existing Data Frame.
3. How To Reset The Index of A Data Frame?
You can reset the index of a Data Frame if it doesn’t shape out to be as you desired. You can use the .reset_index() function to do this.
Input –
Output –
4. How To Delete an Index, Row, or Column to a Pandas DataFrame
Deleting an index
Deleting a column
For removing columns from a Data Frame, you can use the drop() function.
Input –
Output –
Deleting a row
To delete a row from a Data Frame, you can use the drop() function by using the index property to specify the index of the rows you want to delete from the DataFrame.
Input –
Output –
However, to delete duplicate rows, you can use the df.drop_duplicates() function.
Input –
Output –
Sources: Tutorialspoint Datacamp
Our learners also read: Top Python Free Courses
upGrad’s Exclusive Data Science Webinar for you –
How upGrad helps for your Data Science Career?
So, there is your basic tutorial for Data Frame in Python using Pandas.
If you’re interested to learn Python, data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources