There are two main data structures in Pandas:
Let’s watch the following video where Vaidehi talks about these data structures in detail.
The most basic object in Pandas is a Series. To visualise it easily, a series can be thought of as a one-dimensional (1D) NumPy array with a label and an index attached to it. Also, unlike NumPy arrays, they can contain non-numeric data (characters, dates, time, booleans, etc.). Usually, you will work with Series only as part of dataframes.
You could create a Pandas series from an array-like object using the following command:
pd.Series(data, dtype)
A dataframe is the most widely used data structure in data analysis. It is a table with rows and columns, with rows having an index each and columns having meaningful names. There are various ways of creating dataframes, for instance, creating them from dictionaries, reading from .txt and .csv files, etc. Let’s take a look at them one by one.
Creating dataframes from dictionaries
If you have data in the form of lists present in Python, then you can create the dataframe directly through dictionaries. The ‘key’ in the dictionary acts as the column name and the ‘values’ stored are the entries under the column.
You can refer to the Notebook provided below for this segment.
To create a dataframe from a dictionary, you can run the following command:
pd.DataFrame(dictionary_name)
You can also provide lists or arrays to create dataframes, but then you will have to specify the column names as shown below.
pd.DataFrame(dictionary_name, columns = ['column_1', 'column_2'])
Creating dataframes from external files
Another method to create dataframes is to load data from external files. Data may not necessarily be available in the form of lists. Mostly, you will have to load the data stored in the form of a CSV file, text file, etc. Let’s watch the next video and learn how to do that.
Download the file provided 'cars.csv' before you proceed.
Pandas provide the flexibility to load data from various sources and has different commands for each of them. You can go through the list of commands here. The most common files that you will work with are csv files. You can use the following command to load data into a dataframe from a csv file:
pd.read_csv(filepath, sep=',', header='infer')
You can specify the following details:
In the next segment, you will learn about row and column indices in a dataframe.