In this segment, you will learn how to merge and concatenate multiple dataframes. In a real-world scenario, you would rarely have the entire data stored in a single table to load into a dataframe. You would have to load data into Python through multiple dataframes and then find a way to bring everything together.
This is why merge and append are one of the most common operations performed. You will now learn how to perform these tasks using different dataframes. First, let’s start with merging. You will be working with the sales data spread across two sheets in one excel file - 'sales_returns.xlsx'. You can download it from the link below.
Refer to this notebook for the segment.
You can use the following command to merge two dataframes:
dataframe_1.merge(dataframe_2, on = ['column_1', 'column_2'], how = '____')
In the next video, we will take a look at the useful attribute ‘how’ offered by the merge function.
The how
attribute in the code above specifies the type of merge to be performed:
left
: This will select the entries only in the first dataframe.right
: This will consider the entries only in the second dataframe.outer
: This takes the union of all the entries in the dataframes.inner
: This will result in the intersection of the keys from both frames.Depending on the situation, you can use an appropriate method to merge the two dataframes.
Concatenation is much more straightforward than merging. It is used when you have dataframes with the same columns and want to stack them on top of each other, or with the same rows and want to append them side-by-side.
You can add columns or rows from one dataframe to another using the concat function:
pd.concat([dataframe_1, dataframe_2], axis = _)
To append rows, you have to set the axis value as 0. For adding columns from one dataframe to another, the axis value must be set as 1. If there are any extra columns or rows where there are no values, they are replaced with ‘NaN’.
You can also perform various mathematical operations between two or more dataframes. For example, you may have two dataframes for storing the sales information for 2018 and 2019. Now, you want the sales data combined over a period of 2 years. In such a case, the add function in Pandas allows you to directly combine the two dataframes easily.
Apart from the merge, append or concat, you can also perform mathematical operations to combine multiple dataframes. When two dataframes have the same row and column labels, you can directly use the mathematical operators provided in the list below:
add()
: +sub()
: -mul()
: *div()
: /floordiv()
: //mod()
: %pow()
:**
Pandas will return the derived values with the same labels in a combined dataframe. It also provides the attribute fill_value
to control whether how you want to deal with the values that are not common between two dataframes. You can refer to the documentation for the same.
Let's see an example of the add()
function.
In the next segment, you will learn how to create pivot tables using Pandas.