There are multiple ways to select rows and columns from a dataframe or series. In this segment, you will learn how to:
Selection of rows in dataframes is similar to the indexing that you saw in NumPy arrays. The syntax df[start_index:end_index]
will subset the rows according to the start and end indices.
However, you can have all the columns for each row using the function provided above. With the introduction of column labels, selecting columns is no more similar to that in arrays. Let’s watch the next video and learn how to select the required column(s) from a dataframe.
You can continue from the 'Indexing and Slicing' section of the notebook downloaded in the previous segment.
You can select one or more columns from a dataframe using the following commands:
df['column']
or df.column
: It returns a seriesdf[['col_x', 'col_y']]
: It returns a dataframeThe methods taught above allow you to extract either all of the columns of particular rows or all of the rows for a particular column. But how would you extract a specific column from a specific row?
Let’s watch the following video and learn how to do this over Pandas dataframes.
You can use the loc
method to extract rows and columns from a dataframe based on the following labels:
dataframe.loc[[list_of_row_labels], [list_of_column_labels]]
This is called label-based indexing over dataframes. Now, you may face some challenges while dealing with the labels. As a solution, you might want to fetch data based on the row or column number. Let’s see how that is possible over dataframes.
As you learnt in the video, another method for indexing a dataframe is the iloc
method, which uses the row or column number instead of labels:
dataframe.iloc[rows, columns]
Since we use positions instead of labels to extract values from the dataframe, it is called position-based indexing. With these two methods, you can easily extract the required entries from a dataframe based on their labels or positions. Now, let's see how to subset a dataframe based on certain conditions.
Often, you want to select rows that satisfy some given conditions. For example, you may want to select all orders where Sales > 3,000, or all orders where 2,000 < Sales < 3,000 and Profit < 100. Arguably, the best way to perform these operations is to use df.loc[], since df.iloc[] would require you to remember the integer column indices, which is tedious. Let’s start first with one condition to filter the elements in the dataframe.
Now, let’s apply multiple conditions and try to fetch the entries that match the criteria.
As you can see, you can easily segregate the entries based on the multiple conditions provided. One key learning here must be the features that Pandas offers, for example, isin()
in the video. Similar to the isin()
command is isna()
, which checks whether an element in a dataframe is null or empty.
In the next segment, you will learn how to run operations over the dataframes; this will help you create or modify the stored data.