In the previous segment, you learnt how to load data into a dataframe, and manipulate the indices and headers to represent the data in a meaningful manner. In this segment, you will learn some basic functions that will be useful for describing the data stored in the dataframes. You will be working with the sales dataset provided below.
You can use the Jupyter Notebook provided below to code along with the instructor. The same notebook will be used in a few upcoming segments as well.
While working with Pandas, the dataframes may hold large volumes of data, and it would be an inefficient approach to load the entire data whenever an operation is performed. Hence, you must use the following code to load a limited number of entries:
dataframe_name.head()
By default, it loads the first five rows, although you can specify a number if you want fewer or more rows to be displayed. Similarly, to display the last entries, you can use the tail()
command instead of head()
.
Now we will learn about two other functions, namely, info()
and describe()
, that help you understand the data better.
In the video, you learnt about two commands:
dataframe.info()
: This method prints information about the dataframe, which includes the index data type and column data types, the count of non-null values and the memory used. dataframe.describe()
: This function produces descriptive statistics for the dataframe, that is, the central tendency (mean, median, min, max, etc.), dispersion, etc. It analyses the data and generates output for both numeric and non-numeric data types accordingly.
Let’s try to visually understand the findings of the describe function using a box plot.
[Note - The instructor mistakenly refers to the median as the mean in the following video at 2.05 and 2.18]
In the next segment, you will learn how to slice and index the data in a dataframe.