Data Visualization in Python: Fundamental Plots Explained [With Graphical Illustration]
Updated on Jun 13, 2023 | 10 min read | 6.2k views
Share:
For working professionals
For fresh graduates
More
Updated on Jun 13, 2023 | 10 min read | 6.2k views
Share:
Table of Contents
For any aspiring or successful data scientist, being able to explain your research and analysis is a very important and useful skill to possess. This is where data visualization comes into the picture. It is vital to use this tool honestly as the audience can be very easily misinformed or deceived by poor design choices.
As data scientists, we all have certain obligations in the matter of preserving what is true.
The first is that we should be completely honest with ourselves while cleaning and summarizing the data. Data pre-processing is a very crucial step for any machine learning algorithm to work and so any dishonesty in the data will lead to drastically different results.
Another obligation is towards our target audience. There are various techniques in data visualization which are used to highlight specific sections of data and make some other pieces of data less prominent. So if we are not careful enough, the reader will not be able to explore and judge the analysis properly which can lead to doubts and a lack of trust.
Always questioning oneself is a good trait to have for data scientists. And we should always think about how to show what truly matters in an understandable as well as aesthetically pleasing way, while also remembering that context is important.
This is exactly what Alberto Cairo tries to portray in his teachings. He mentions the Five Qualities of Great Visualizations: beautiful, enlightening, functional, insightful, and truthful which are worth keeping in mind.
The following tips can help you choose the most suitable data visualization using Python.
Now that we have a basic understanding of design principles, let’s dive into some fundamental visualization techniques using the matplotlib library in python.
All the code below can be executed in a Jupyter notebook.
%matplotlib notebook
# this provides an interactive environment and sets the back end. (%matplotlib inline can also be used but it’s not interactive. This means that any further calls to plotting functions will not automatically update our original visualization.)
import matplotlib.pyplot as plt # importing the required library module
The simplest matplotlib function to plot a point is plot(). The arguments represent X and Y coordinates, then a string value that describes how the data output should be shown.
plt.figure()
plt.plot( 5, 6, ‘+’ ) # the + sign acts as a marker
A scatterplot is a two-dimensional plot. The scatter() function also takes the X value as a first argument and Y value as the second. The plot below is a diagonal line and matplotlib automatically adjusts the size of both axes. Here, the scatter plot doesn’t treat the items as a series. So, we can also give in a list of desired colors corresponding to each of the points.
import numpy as np
x = np.array( [1, 2, 3, 4, 5, 6, 7, 8] )
y = x
plt.figure()
plt.scatter( x, y )
A histogram is another method of data visualization in Python. It is a graphic depiction of a frequency distribution of grouped continuous classes. In essence, a histogram shows data divided into multiple groups. It is a technique to graphically represent the distribution of numerical data. As shown in the figure below, the X-axis in a histogram displays the bin ranges, a total bill in this case, and the Y-axis displays the count.
The syntax used for the histogram:
sns.histplot(x='totalbill', data=data, kde=True)
plt.show()
A heatmap is a Python visualisation method that allows the visualization of a correlation matrix, time-series movements, temperature variations, and confusion matrix. You may visualize your data by using heatmaps. They can show significant correlations in your data in a variety of contexts.
The syntax used for heatmaps:
hm = sn.heatmap(data = data)
plt.show()
A line plot is created with the plot() function and plots a number of different series of data points like a scatter plot but it connects each point series with a line.
import numpy as np
linear_data = np.array( [1, 2, 3, 4, 5, 6, 7, 8] )
squared_data = linear_data**2
plt.figure()
plt.plot( linear_data, ‘-o’, squared_data, ‘-o’)
To make the graph more readable, we can also add a legend which will tell us what each line represents. A suitable title for the graph and both the axes is important. Also any section of the graph can be shaded using the fill_between() function to highlight relevant regions.
plt.xlabel(‘X values’)
plt.ylabel(‘Y values’)
plt.title(‘Line Plots’)
plt.legend( [‘linear’, ‘squared’] )
plt.gca().fill_between( range ( len ( linear_data ) ), linear_data, squared_data, facecolor = ‘blue’, alpha = 0.25)
This is what the modified graph looks like-
We can plot a bar chart by sending in arguments for the X values and the height of each bar to the bar() function. Below is a bar plot of the same linear data array we used above.
plt.figure()
x = range( len ( linear_data ))
plt.bar( x, linear_data )
# for plotting the squared data as another set of bars on the same graph, we have to adjust the new x values to make up for the first set of bars
new_x = []
for data in x:
new_x.append(data+0.3)
plt.bar(new_x, squared_data, width = 0.3, color = ‘green’)
# For graphs with horizontal orientation we use the barh() function
plt.figure()
x = range( len( linear_data ))
plt.barh( x, linear_data, height = 0.3, color = ‘b’)
plt.barh( x, squared_data, height = 0.3, left = linear_data, color = ‘g’)
#here is an example of stacking bar plots vertically
plt.figure()
x = range( len( linear_data ))
plt.bar( x, linear_data, width = 0.3, color = ‘b’)
plt.bar( x, squared_data, width = 0.3, bottom = linear_data, color = ‘g’)
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Our learners also read: Top Python Courses for Free
upGrad’s Exclusive Data Science Webinar for you –
ODE Thought Leadership Presentation
In addition to the basic techniques, some advanced techniques are as follows:
The visualization types don’t just end here. Python also has a great library called seaborn which is definitely worth exploring. Proper information visualization greatly helps increase the value of our data. Data visualization will always be the better option for gaining insights and identifying various trends and patterns rather than looking through boring tables with millions of records.
If you are curious to learn about data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources