Scatterplot, as the name suggests, displays the scatter of the data. It can be helpful in checking for any relationship pattern between two quantitative variables and detecting the presence of outliers within them.
Let’s watch the video below to understand why a scatterplot would be better when dealing with two quantitative variables. The Jupyter Notebook used in this segment is provided below. Code along with the instructor for maximum benefit.
You can use the following command to build a scatterplot:
plt.scatter(x_axis, y_axis)
This will result in a spread of points across the graph corresponding to the values on the y and x-axes. Now, as in the earlier segment, think about how you can add elements to this scatterplot.
Matplotlib also offers a feature that allows incorporating a categorical distinction between the points plotted on a scatterplot. You can colour-code the points based on the category and distinguish them from each other. Suppose you had three categories. Let’s see how this can be done.
You can run the scatter function with the following attributes to specify the colours and labels of the categories in the data set:
plt.scatter(x_axis, y_axis, c = color, label = labels)
Here, all the information (x_axis, y_axis, colour, labels) needs to be provided in the form of a list or array. With the above command, you will be able to assign colours to the categories and distinguish them from each other.
Another feature of a scatterplot is that the points can be further distinguished over another dimension variable using labels. You have another array ‘country’ that tells you the country where the sales were made. Suppose you want to highlight the points belonging to a particular country in the figure created above. Let’s see how we can do that.
[00:50] The instructor has missed 'p' at the start while copying. Do not make the same mistake while implementing the code.
As shown in the video, you can use the following command to add a note (annotate) with a point in the scatterplot:
plt.annotate(text, xy = points_to_annotate_xy)
Having completed this segment, you must have understood that a scatterplot helps you visualise two numeric variables. Matplotlib also offers you multiple features to make these plots as descriptive as possible using the different dimension variables associated with it.
In the next segment, you will learn about another set of graphs, namely, Line Graph and Histogram.