After you have loaded the data in the dataframes, it is not necessary that they will be usable in the same format. You may have to make modifications or generate new entries from the existing data to get the desired format. Let’s take a look at the features that the Pandas library offers in this respect.
You can use the Notebook provided in the previous segment to code along with the instructor. Make sure you are typing the codes as it will help you understand them better.
Note: [03:19] There is a mistake in the video. The entries in the column ‘Sales_in_thousand’ do not reflect the updated entries.
In the video above, you learnt how to edit an existing column and change its name. The Pandas library offers a wide range of functions to modify the data in the dataframes. To learn more about such binary operations, you can visit this link.
You can use the following code to rename a column:
dataframe.rename(index={row_index: "new_name"}, columns={column_name: "new_name"})
Another way to modify or create a new column is by using the lambda
functions.
Suppose you want to create a new column ‘Positive Profit’, which replaces the negative values in ‘Profit’ as NaN. You need to apply a function that returns NaN if Profit < 0, else returns the value itself. This can be done easily by using the apply()
method on a column of the dataframe.
The columns that are created by the user are known as 'Derived Variables'. Derived variables increase the information conveyed by the dataframe. Now, you can use the lambda function to modify the dataframes.
You have learnt how to create multilevel indices in the earlier segment while loading data in Pandas. However, you can also alter the index column after loading the data into a dataframe. Let’s watch the following video and learn how to do that and fetch entries from these dataframes.
You can use the following code to set a multilevel index in a dataframe:
dataframe.set_index([column_1, column_2])
To obtain data from such dataframes, you have to provide the row details as a tuple inside a list. You can go through the code provided below for reference:
dataframe.loc[[(label_1, sub_label_1), (label_1, sub_label_2)], [column_label_1, column_label_2]]