K Means Clustering Matlab [With Source Code]
Updated on Feb 26, 2025 | 9 min read | 9.7k views
Share:
For working professionals
For fresh graduates
More
Updated on Feb 26, 2025 | 9 min read | 9.7k views
Share:
K-means clustering is one of the most commonly used techniques by data professionals. Due to the algorithm’s efficacy, it is demanded by numerous industries in various applications.
A data scientist’s job requires the implementation of Clustering in many stages. Many large-scale projects are currently based upon the clustering algorithm and have drastically raised the bar for the demand of data science professionals.
One of those algorithms is the K-means clustering, which is the basic idea of this article and its implementation with the MATLAB source code.
Before getting the topic’s hold, let’s have a quick look at what Clustering is, its significance, and how it can be implemented in real life. By the end of the post, you will come to know how crucial this algorithm is for understanding data in large sets.
Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Data is the most critical component for any application, and a cluster is nothing but an accumulation of similar data points combined. As the name clearly defines, Clustering is the process of dividing a large chunk of data into subgroups or only clusters based on the data pattern.
In machine learning, Clustering is applied when there is no predefined data available. The ultimate aim is to group data into classes with high Intra-class similarity.
Clustering is used to explore data. Some real-life examples where it can be used are in market segmentation to find customers with similar behaviours, image segmentation/compression, document clustering with multiple topics, etc.
It is a requisite step before processing data to identify homogeneous groups for building supervised models. K-Means clustering is an unsupervised learning algorithm as we have to look for data to integrate similar observations and form distinct groups.
Let’s take a look at the K-Means algorithm, which is one of the most applied and the simplest clustering algorithms.
K-means clustering is one of the most desired unsupervised machine learning algorithms.
Unsupervised algorithms make conclusions from datasets using input vectors without referring to labelled outcomes.
It is an iterative distance-based or centroid-based algorithm that segregates the dataset into K distinct subgroups (clusters) where each data point belongs to one group. The similarity of the intra-cluster data points is increased, and the distance between the clusters is kept optimum.
The distance between the data points and the centroid of the cluster is kept at a minimum, such as Euclidean distance. In K-Means, each cluster is linked to a centroid. The primary aim is to minimise the distances between the points and the respective cluster centroid.
FYI: Free nlp course!
As the clustering process means several iterations to be performed, the K-Means algorithm has a unique way of working. Here is a step-by-step explanation of the way it works:
Step 1: Initially, define the number of clusters ‘K’.
Step 2: Initialise random K data points as centroids for each cluster.
If there are 2 clusters, the value of ‘K’ will be 2.
Step 3: Perform several iterations until the assigned data points to clusters do not change.
Step 4: Calculate the sum of the squared distance between data points and the centroids.
Step 5: Allocate each data point to the closest cluster (centroid) to minimise the distance.
Step 6: Take an average of the centroids of the clusters belonging to each other.
This is a single iteration process performed for computing the centroid and assigning the points to the cluster based on their distance from the centroid. Once all the centroids are defined, the process is stopped.
Statement: One of the famous food chains, McDonald’s wants to open a chain of outlets across California and want to find out the locations that will fetch them maximum revenue.
Ø A strong e-commerce presence
Ø Online customer data for analysing locations from where the orders are made frequently
All these points need a lot of analysis and mathematics to work on.
With a predefined value of K, the K-means algorithm can be implemented in the following steps:
Likewise, the K-Means Clustering algorithm can be applied to a variety of applications in varied scales. The hospitality industry, crime investigation departments, and image resizing, to name a few.
K-Means algorithm is implemented using many languages such as R, Python, MATLAB, etc. In the next section, we will look at how K-Means Clustering MATLAB is applied.
K-Means is a largely used algorithm used by many professionals dealing with data science, machine learning, artificial intelligence, cryptography, and cybersecurity.
The core objective of using this algorithm is to find out the centroid of each cluster. The data given to a programmer is heterogeneous. Here is the MATLAB code for plotting the centroid of each cluster and assign the coordinates of each centroid:
Code:
rng default; % For reproducibility
X = [randn(100,2)*0.75+ones(100,2);
randn(100,2)*0.5-ones(100,2)];
opts=statset(‘Display’,’final’);
[idx,C]=kmeans(X,4,’Distance’,’cityblock’,’Replicates’,5,’Options’,opts);
plot(X(idx==1,1),X(idx==1,2),’r.’,’MarkerSize’,12);
hold on;
plot(X(idx==2,1),X(idx==2,2),’b.’,’MarkerSize’,12);
plot(X(idx==3,1),X(idx==3,2),’g.’,’MarkerSize’,12);
plot(X(idx==4,1),X(idx==4,2),’y.’,’MarkerSize’,12);
plot(C(:,1),C(:,2),’Kx’,’MarkerSize’,15,’LineWidth’,3);
legend(‘Cluster 1′,’Cluster 2′,’Cluster 3′,’Cluster 4′,’Centroids’, ‘Location’,’NW’);
title(‘Cluster Assignments and centroids’);
hold off;
for i=1:size(C, 1)
display([‘Centroid ‘, num2str(i), ‘: X1 = ‘, num2str(C(i, 1)), ‘; X2 = ‘, num2str(C(i, 2))]);
end
Output:
MATLAB Window Showing Four Clusters and Respective Centroids
Results:
The centroids obtained are as follows:
K-means clustering is a versatile algorithm and can be used for many business use cases for any type of grouping. Some examples are:
Ø Behavioral Segregation:
Ø Image Scaling
Ø Sensor measurements:
Ø Determine bots or anomalies:
Ø Inventory classification:
Must Read: MATLAB Data Types
There’s a reason why top professionals prefer the K-Means clustering algorithm. Some benefits it offers:
K-means clustering is a broadly used approach for analysing data clusters. Once you gain command, it is easier to understand and apply and deliver results quickly.
We hope with this article; we could introduce you to this analysis technique. For any queries regarding the K-means algorithm, feel free to comment below.
Further, if this field of study interests you, have a look at our PG Diploma in Machine Learning and AI program which is specially curated for working professionals offering 30+ case studies & assignments, 25+ mentorship sessions from industry experts, 10 Practical Hands-on Capstone Projects, 450+ hours of learning and placement assistance.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Top Resources