Indexing and Slicing in Python

$$/$$

There are multiple ways to select rows and columns from a dataframe or series. In this segment, you will learn how to:

  • Select rows from a dataframe
  • Select columns from a dataframe
  • Select subsets of dataframes

 

Selection of rows in dataframes is similar to the indexing that you saw in NumPy arrays. The syntax df[start_index:end_index] will subset the rows according to the start and end indices. 

 

However, you can have all the columns for each row using the function provided above. With the introduction of column labels, selecting columns is no more similar to that in arrays. Let’s watch the next video and learn how to select the required column(s) from a dataframe.

 

You can continue from the 'Indexing and Slicing' section of the notebook downloaded in the previous segment.

MLC_0.2.3__Ver 03_Video 10
Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
  • Chapters
  • descriptions off, selected
  • subtitles off, selected
      $$/$$

      You can select one or more columns from a dataframe using the following commands:

      • df['column'] or df.column: It returns a series
      • df[['col_x', 'col_y']]: It returns a dataframe
      $$/$$

      The methods taught above allow you to extract either all of the columns of particular rows or all of the rows for a particular column. But how would you extract a specific column from a specific row?

       

      Let’s watch the following video and learn how to do this over Pandas dataframes.

      MLC_0.2.3__Ver 03_Video 11
      Video Player is loading.
      Current Time 0:00
      Duration 0:00
      Loaded: 0%
      Stream Type LIVE
      Remaining Time 0:00
       
      1x
      • Chapters
      • descriptions off, selected
      • subtitles off, selected
          $$/$$

          You can use the loc method to extract rows and columns from a dataframe based on the following labels:

          dataframe.loc[[list_of_row_labels], [list_of_column_labels]]
          

           

          This is called label-based indexing over dataframes. Now, you may face some challenges while dealing with the labels. As a solution, you might want to fetch data based on the row or column number. Let’s see how that is possible over dataframes.

          MLC_0.2.3__Ver 03_Video 12
          Video Player is loading.
          Current Time 0:00
          Duration 0:00
          Loaded: 0%
          Stream Type LIVE
          Remaining Time 0:00
           
          1x
          • Chapters
          • descriptions off, selected
          • subtitles off, selected
              $$/$$

              As you learnt in the video, another method for indexing a dataframe is the iloc method, which uses the row or column number instead of labels:

              dataframe.iloc[rows, columns]
              

               

              Since we use positions instead of labels to extract values from the dataframe, it is called position-based indexing. With these two methods, you can easily extract the required entries from a dataframe based on their labels or positions. Now, let's see how to subset a dataframe based on certain conditions.

              $$/$$

              Subsetting Rows Based on Conditions

              Often, you want to select rows that satisfy some given conditions. For example, you may want to select all orders where Sales > 3,000, or all orders where 2,000 < Sales < 3,000 and Profit < 100. Arguably, the best way to perform these operations is to use df.loc[], since df.iloc[] would require you to remember the integer column indices, which is tedious. Let’s start first with one condition to filter the elements in the dataframe.

              MLC_0.2.3__Ver 03_Video 13
              Video Player is loading.
              Current Time 0:00
              Duration 0:00
              Loaded: 0%
              Stream Type LIVE
              Remaining Time 0:00
               
              1x
              • Chapters
              • descriptions off, selected
              • subtitles off, selected
                  $$/$$

                  Now, let’s apply multiple conditions and try to fetch the entries that match the criteria.

                  MLC_0.2.3__Ver 03_Video 14
                  Video Player is loading.
                  Current Time 0:00
                  Duration 0:00
                  Loaded: 0%
                  Stream Type LIVE
                  Remaining Time 0:00
                   
                  1x
                  • Chapters
                  • descriptions off, selected
                  • subtitles off, selected
                      $$/$$

                      As you can see, you can easily segregate the entries based on the multiple conditions provided. One key learning here must be the features that Pandas offers, for example, isin() in the video. Similar to the isin() command is isna(), which checks whether an element in a dataframe is null or empty.

                      $$/$$

                      In the next segment, you will learn how to run operations over the dataframes; this will help you create or modify the stored data.