Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

Hierarchical Clustering in Python [Concepts and Analysis]

Updated on 25 November, 2022

9.49K+ views
15 min read

With the increase in the flow of raw data and the need for analysis, the concept of unsupervised learning became popular over time. It is used to draw insights from datasets consisting of input data without labeled target values. Before we start discussing hierarchical clustering in Python and applying the algorithm on various datasets, let us revisit the clustering’s basic idea.

Clustering mainly deals with the classification of raw data. It comprises grouping different data points together, which are most similar to each other. These groups are called clusters, which are formed based on the similarity or clustering metric defined.  

Introduction

Hierarchical clustering deals with data in the form of a tree or a well-defined hierarchy. The process involves dealing with two clusters at a time. The algorithm relies on a similarity or distance matrix for computational decisions. Meaning, which two clusters to merge or how to divide a cluster into two. With these two options in mind, we have two types of hierarchical clustering. If you are a beginner and interested to learn more about data science, check out our data science courses from top universities.

One of the algorithm’s critical aspects is the similarity matrix (also known as proximity matrix), as the whole algorithm proceeds based on it. There are many proximity methods which are discussed further along in the article.

Types

Hierarchical clustering has two types:

  1. Agglomerative clustering
  2. Divisive clustering

The types are per the fundamental functionality: the way of developing hierarchy. Agglomerative is a bottom-up hierarchy generator, whereas divisive is a top-down hierarchy generator.

Agglomerative takes all points as individual clusters and then merges them on each iteration, two at a time. Divisive starts by assuming the entire data as one cluster and divides it until all points become individual clusters.

The result is a set of nested clusters that can be perceived as a hierarchical tree. The best way to view it is to convert the set structure into a dendrogram to view the hierarchy.

The following gives a simple example of a dendrogram versus the cluster representation:

Source

Here, the clustering may work either way, but the result will be a collection of clusters. The data points 1, 2, 3, 4, 5, and 6 are clustered into two at a time. And the hierarchy formation can be seen in the left figure, which deals with the dendrogram of the same. The same analysis would help in understanding the decision of clusters.

Deciding the number of clusters

One of the most useful features of this algorithm is that you may extract as many clusters as you want once the algorithm terminates. It is quite different from the K-means algorithm. In K-means, we need to pass the no-of-clusters hyperparameter. It means that once the algorithm completes computation, we would have that many clusters. But, if we need more clusters later, we cannot tune that easily. The only option would be to change the parameter and train the model again.

Whereas, when it comes to the hierarchical clustering, you can set the number of clusters later. You can take two clusters at the end. If not satisfied, you may take the five clusters formed at the penultimate or higher-level step. It depends on you. Hence, once trained, you do not need to retrain the model to get more or fewer clusters. It can be accomplished by simply cutting the dendrogram at the level you desire.

As we have the concepts down, let us discuss the working of hierarchical clustering in Python.

For the experiment, we are going to use the sci-kit learn library for the clustering algorithms. We would also use the cluster.dendrogram module from SciPy to visualize and understand the “cutting” process for limiting the number of clusters.

import numpy as np

X = np.array([[3,5],

[12,9],

[13,17],

[14,14],

[60,52],

[55,63],

[69,59],])

It would look something like this on a plot:

Well, we do see that we have two definitive clusters, at the top and bottom corners. Let us see if the algorithm can figure it out or not.

We would be using the AgglomerativeClustering function from the sklearn.clustering module.

from sklearn.cluster import AgglomerativeClustering

cluster = AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, linkage=’ward’)

cluster.fit_predict(X)

Here, we do specify the clusters, which is not a hyperparameter. However, we just pass it to make the prediction classes clear. We would use the fit_predict function to train as well as predict the classes over X.

It is important to note that agglomerative clustering is more used than divisive as it is simpler to execute. The idea of merging clusters based on proximity matrices does seem easier than to divide a cluster into two via some mechanism.

Read: Scikit-learn in Python: Features, Prerequisites, Pros & Cons

To clearly understand what happened above, look at the steps involved in the algorithm:

Working of the algorithm

Here are the steps to execute the agglomerative clustering:

  1. Define each data point as a cluster
  2. Calculate the initial proximity metric
  3. Merge two clusters which are the “closest” or similar based on the metric
  4. Revise the proximity metric and repeat the third step until a single cluster remains.

So, here the only thing remaining to understand is the impact of different proximity methods. As you know, mainly, there are four types of proximity methods in hierarchical clustering. This is also known as Inter-cluster similarity.

The methods (or linkage, as defined in code) include:

  1. MIN or Single linkage
  2. MAX or Complete linkage
  3. Average linkage
  4. Centroid linkage
  5. Exclusive functions of objective functions

The results of the same can be easily visualized by applying the linkage option while creating the dendrograms.

To visualize the output of the model, we just need a small code snippet as follows:

plt.scatter(X[:,0],X[:,1], c=cluster.labels_, cmap=’winter’)

As you can see, there are two different clusters on the opposite corners. You may as well play around with cluster numbers and see different results. The whole thing can be driven by cutting dendrograms. To understand that, let us write a small snippet for the visualization of dendrograms creation.

We are going to use dendrogram and linkage functions from the scipy.cluster.hierarchy module. Here, we define the linkage that we want to use. We need to pass that object to the dendrogram function to generate the hierarchy.

from scipy.cluster.hierarchy import dendrogram, linkage

linked = linkage(X, ‘complete’)

labelList = range(1, 8)

plt.figure(figsize=(10, 7))

dendrogram(linked,

         orientation=’top’,

         labels=labelList,

         distance_sort=’descending’,

         show_leaf_counts=True)

plt.show()

Here, you can visualize how the clusters are formed on each iteration. So, you can cut the dendrogram on any level that you want, and you would end up with that many clusters. Hence, due to this hierarchy creation, you may vary the number of clusters after just one run through the algorithm and data. It is what gives hierarchical clustering an edge upon other algorithms like K-means. 

Now, let us look at how to use hierarchical clustering in Python on a commonly used dataset: IRIS. We would be reading the dataset from a local csv. and just have a glance at how the dataset looks and what we need to classify.

import numpy as np 

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

data = pd.read_csv(‘iris.csv’)

data.head()

As you can see, the target variable is the ‘variety’ class. This is in string format which needs to be converted into numbers, as the model requires encoded labels. To do this, we would be using the label encoder from sklearn’s preprocessing library. A simple fit and transform to convert them into numbers.

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(data[‘variety’])

data[‘variety’] = le.transform(data[‘variety’])

Now, if we create a dendrogram on this, we find various iterations and maps. This is how it looks with a single linkage. If we use the same code and run it with complete or centroid linkage, the dendrograms would differ a bit. The logic remains the same, but different linkages would definitely affect the order of the merger of clusters.

from scipy.cluster.hierarchy import dendrogram, linkage

linked = linkage(data, ‘ward’) 

plt.figure(figsize=(10, 7))

dendrogram(linked)

plt.show()

Now, applying clustering on the dataset, we would use two different linkages, and you would clearly see what difference it really has while defining the clusters. As we already have seen from the label encoder that we have 3 different classes, so we may apply 3 clusters at first. 

from sklearn.cluster import AgglomerativeClustering

cluster = AgglomerativeClustering(n_clusters=3, affinity=’euclidean’, linkage=’complete’)

cluster.fit_predict(data)

plt.figure(figsize=(10, 7))  

plt.scatter(data[‘sepal.length’], data[‘petal.length’], c=cluster.labels_) 

As you can see from the figure above, in the 3-cluster classification, the linkages show visible changes in prediction. Look at the ward linkage first. It correctly predicts the labels by keeping the above cluster defined, even though there is a small mix up of values in the two clusters. But, when we see the complete linkage, it breaks down the cluster and misclassified some of the values.

As we know in the proximity methods, the complete linkage does tend to break the larger clusters, as we can see above. The ward’s method or the single linkage method is less vulnerable to these issues. This was for the simple datasets. Let us see how the algorithm suffers and is affected by some noisy datasets.

One such dataset is the Pulsar prediction dataset or HTRU2 dataset. The dataset is larger, as it contains about 18,000 samples. If seen with an ML perspective, the dataset is fairly regular size, or even lower. But, comparatively, it is heavier than the IRIS dataset. The need for implementation on a varied dataset is to analyze the performance of hierarchical clustering in Python. To clearly understand the ways and perks of implementations,

pulsar_data = pd.read_csv(‘pulsar_stars.csv’)

pulsar_data.head()

we would need to normalize the dataset so that it doesn’t get biased due to extreme values.

from sklearn.preprocessing import normalize

pulsar_data = normalize(pulsar_data)

We would be using the standard code, but this time, we are timing both computations.

%%time

from scipy.cluster.hierarchy import dendrogram, linkage

linked = linkage(pulsar_data, ‘ward’)

plt.figure(figsize=(10, 7))

dendrogram(linked)

plt.show()

The timing for generating a dendrogram on the IRIS dataset was 6 seconds. The timing for generating a dendrogram on HTRU2 dataset was 13min 54 seconds. But, this is nothing compared to the change in predictions due to different linkages, which you observe in the model trained with the HTRU2 dataset.

Let us follow the same procedure as we did before. This time we would make predictions on every linkage. 

The following figure shows the predictions of clustering with each linkage:

cluster = AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, linkage=’average’) #as well as complete,ward and single

cluster.fit_predict(pulsar_data)

plt.figure(figsize=(10, 7))  

plt.scatter(pulsar_data[:,1], pulsar_data[:,7], c=cluster.labels_) 

Yes, it is indeed surprising how much the predictions differ from each other. This shows the importance of the proximity matrix in hierarchical clustering.

As you can see, the single linkage takes in almost all the points as the minimum distance between two clusters defines the proximity metric. This makes it vulnerable to noisy data. If we see the complete linkage, it definitely splits the data into two clusters, but it may have broken the large cluster just due to its proximity.

The average-linkage is a trade-off between the two. It is less affected by noise, but it still may break large clusters, but with a lesser probability. And, it does handle the classification better.

Objective functions like the ward’s method are sometimes used for initializing other clustering methods like K-means. This method, just like the average-linkage, has a trade-off between the single and complete-linkage methods. Objective functions like the ward’s method are mainly used in customized solutions to lessen the probability of misclassification. And, we do see it performing well.

Learn: Cluster Analysis in Data Mining: Applications, Methods & Requirements

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

Time and Space Complexity

Just to give an understanding, consider the way the proximity metric is defined and calculated. The proximity metric requires to store the distance between every pair of clusters inside the data map. It makes for space complexity: O(n2). It is a large number. To put it in perspective, imagine we have 1,000,000 points. That will take the space requirements to 1012 points. Taking a rough and heavy average by approximating the size of one point as a byte, we get the data size at 1TB. And this needs to be stored in RAM, not on the hard drive. 

Second, comes the time complexity. For the need of scanning the proximity matrix at every iteration and considering that we take n steps, we get the complexity as O(n3). It is computationally expensive, especially on big datasets.

It may be possible to bring it down to O(n2logn), but, it still is too expensive compared to other clustering algorithms, like K-means. If you want to learn more on analyzing the space and time complexity of algorithms and optimizing the cost functions, you may head down to upGrad’s Programs in Data Science and Machine Learning.

Limitations 

  • We have already discussed the first limitation: Space and time complexity. It is obvious that hierarchical clustering is not favourable in the case of big datasets. Even if time complexity is managed with faster computational machines, the space complexity is too high. Especially when we load it in the RAM. And, the issue of speed increases even more when we are implementing the hierarchical clustering in Python. Python is slow, and if big tasks are concerned, it will definitely suffer.
  • Secondly, there is no optimized technique with proximity. If we see each has multiple problems and limitations, this makes the internal mechanism of the algorithm unoptimized. 
  • When we look at the clustering decisions, they are not retractable. Meaning- once the clustering has been applied for a definite iteration, it will not be changed in further iterations till termination. So, if due to structural inaccuracies, the algorithm, at any point, chooses wrong clusters to combine or split, it is irrevocable. 
  • If we look closely at the algorithm, we do not have a clear objective function that is being minimized. In other algorithms, there is a definite function that we try to optimize. For example, in K-means we have a clear cost function which we minimize, which is not the case with hierarchical clustering.  

Check out: Top 9 Data Science Algorithms Every Data Scientist Should Know

Conclusion 

Even though there are certain limitations when it comes to large datasets, this type of clustering algorithm is appealing while dealing with small to medium scale datasets. The hierarchical clustering algorithm in Python has not seen much development in architecture or schema due to its alarming need for time and space complexity.

And, it is true that right now, the time is of Big Data. It means we do require algorithms that scale better. But, still, in cases when we are not sure of the number of clusters, or we need to refine the analysis efficiently, hierarchical clustering in Python can be a satisfactory choice.

With this, you now know how to implement hierarchical clustering in Python.

For understanding more such algorithms and applications of methods in Machine Learning and Data Science, do have a look at the course offerings by upGrad. We have cumulative programs for any of the career paths that you want to follow.

The programs are curated by top professionals as well as the professors at IIIT-B. For more information, head down to the upGrad.If you are curious about learning data science to be in the front of fast-paced technological advancements, check out upGrad & IIIT-B’s Executive PG Programme in Data Science.

Frequently Asked Questions (FAQs)

1. How to perform hierarchical clustering in Python?

Hierarchical Clustering is a type of unsupervised machine learning algorithm that is used for labeling the data points. Hierarchical clustering groups the elements together based on the similarities in their characteristics. For performing hierarchical clustering, you need to follow the below steps:
Every data point has to be treated as a cluster in the beginning. So, the number of clusters in the beginning, will be K, where K is an integer representing the total number of data points.
Build a cluster by joining the two closest data points so that you are left with K-1 clusters.
Continue forming more clusters to result in K-2 clusters and so on.
Repeat this step until you find that there is a big cluster formed in front of you.
Once you are left only with a single big cluster, dendrograms are used to divide those clusters into multiple clusters based on the problem statement.
This is the entire process for performing hierarchical clustering in Python.

2. Which are the two types of hierarchical clustering?

There are two main types of hierarchical clustering. They are:
Agglomerative Clustering
This clustering method is also known as AGNES (Agglomerative Nesting). This algorithm uses the bottom-up approach. Here, every object is considered to be a single-element cluster. The two clusters with similar characteristics are combined to form a bigger cluster. This method is followed until you are left with a single big cluster.
Divisive Hierarchical Clustering
This clustering method is also known as DIANA (Divisive Analysis). This algorithm follows the top-down approach, which is the inverse of the one used by AGNES. Here, the root node will consist of a huge cluster of all the elements. After every step, the most heterogeneous cluster is divided, and this process is continued until you are left with a single cluster.

3. Which type of hierarchical clustering algorithm is more widely used?

As you know, there are two types of hierarchical clustering algorithms – Agglomerative and Divisive Clustering. Among both the algorithms, the Agglomerative algorithm is more commonly preferred for performing hierarchical clustering.
In this method, you group all the objects based on their similarities with the help of a bottom-up approach. Starting from a single node, you reach up to a single big cluster filled with nodes carrying similar characteristics.